Proposal: Adding compression of temporary files

Started by Filip Janusabout 1 year ago22 messages

fjanus@redhat.com

about 1 year ago

1 attachment(s)

Hi all,
Postgresql supports data compression nowadays, but the compression of
temporary files has not been implemented yet. The huge queries can
produce a significant amount of temporary data that needs to be stored on
disk
and cause many expensive I/O operations.
I am attaching a proposal of the patch to enable temporary files
compression for
hashjoins for now. Initially, I've chosen the LZ4 compression algorithm. It
would
probably make better sense to start with pglz, but I realized it late.

# Future possible improvements
Reducing the number of memory allocations within the dumping and loading of
the buffer. I have two ideas for solving this problem. I would either add a
buffer into
struct BufFile or provide the buffer as an argument from the caller. For
the sequential
execution, I would prefer the second option.

# Future plan/open questions
In the future, I would like to add support for pglz and zstd. Further, I
plan to
extend the support of the temporary file compression also for sorting, gist
index creation, etc.

Experimenting with the stream mode of compression algorithms. The
compression
ratio of LZ4 in block mode seems to be satisfying, but the stream mode
could
produce a better ratio, but it would consume more memory due to the
requirement to store
context for LZ4 stream compression.

# Benchmark
I prepared three different databases to check expectations. Each
dataset is described below. My testing demonstrates that my patch
improves the execution time of huge hash joins.
Also, my implementation should not
negatively affect performance within smaller queries.
The usage of memory needed for temporary files was reduced in every
execution without a significant impact on execution time.

*## Dataset A:*
Tables
table_a(bigint id,text data_text,integer data_number) - 10000000 rows
table_b(bigint id, integer ref_id, numeric data_value, bytea data_blob) -
10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id = b.id;

The tables contain highly compressible data.
The query demonstrated a reduction in the usage of the temporary
files ~20GB -> 3GB, based on this reduction also caused the execution
time of the query to be reduced by about ~10s.

*## Dataset B:*
Tables:
table_a(integer id, text data_blob) - 1110000 rows
table_b(integer id, text data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id = b.id;

The tables contain less compressible data. data_blob was generated by a
pseudo-random generator.
In this case, the data reduction was only ~50%. Also, the execution time
was reduced
only slightly with the enabled compression.

The second scenario demonstrates no overhead in the case of enabled
compression and extended work_mem to avoid temp file usage.

*## Dataset C:*
Tables
customers (integer,text,text,text,text)
order_items(integer,integer,integer,integer,numeric(10,2))
orders(integer,integer,timestamp,numeric(10,2))
products(integer,text,text,numeric(10,2),integer)

Query: SELECT p.product_id, p.name, p.price, SUM(oi.quantity) AS
total_quantity, AVG(oi.price) AS avg_item_price
FROM eshop.products p JOIN eshop.order_items oi ON p.product_id =
oi.product_id JOIN
eshop.orders o ON oi.order_id = o.order_id WHERE o.order_date > '2020-01-01'
AND p.price > 50
GROUP BY p.product_id, p.name, p.price HAVING SUM(oi.quantity) > 1000
ORDER BY total_quantity DESC LIMIT 100;

This scenario should demonstrate a more realistic usage of the database.
Enabled compression slightly reduced the temporary memory usage, but the
execution
time wasn't affected by compression.

+------------+-------------------------+-----------------------+------------------------------+
|  Dataset   | Compression.       | temp_bytes         | Execution Time
(ms)   |
+------------+-------------------------+-----------------------+-----------------------------
+
| A             | Yes                        |  3.09 GiB            |
22s586ms           | work_mem  = 4MB
|                | No                         |  21.89 GiB          |
35s                       | work_mem  = 4MB
+------------+-------------------------+-----------------------+----------------------------------------
| B             | Yes                        |  333 MB               |
1815.545 ms       | work_mem = 4MB
|                 | No                        |  146  MB              |
1500.460 ms        | work_mem = 4MB
|                 | Yes                       |  0 MB
| 3262.305 ms        | work_mem = 80MB
|                 | No                        |  0 MB
| 3174.725 ms         | work_mem = 80MB
+-------------+------------------------+------------------------+-------------------------------------
| C             | Yes                       | 40 MB
| 1011.020 ms        | work_mem = 1MB
|                | No                        |  53 MB                 |
1034.142 ms        | work_mem = 1MB
+------------+------------------------+------------------------+--------------------------------------

Regards,

-Filip-

Attachments:

0001-This-commit-adds-support-for-temporary-files-compres.patchapplication/octet-stream; name=0001-This-commit-adds-support-for-temporary-files-compres.patchDownload

From 1f79947c3348a7f3596062cad2236855053641a6 Mon Sep 17 00:00:00 2001
From: Filip <fjanus@redhat.com>
Date: Thu, 24 Oct 2024 12:15:10 +0200
Subject: [PATCH v1] Add support for temporary files compression
This commit adds support for temporary files compression, it can be
used only for hashjoins now.

It also adds GUC parameter temp_file_compression that enables this functionality.
For now, it supports just lz4 algorithms. In the future, it
could also be implemented pglz and zstd support.
---
 src/backend/access/gist/gistbuildbuffers.c    |   2 +-
 src/backend/backup/backup_manifest.c          |   2 +-
 src/backend/executor/nodeHashjoin.c           |   2 +-
 src/backend/storage/file/buffile.c            | 168 +++++++++++++++++-
 src/backend/utils/misc/guc_tables.c           |  23 +++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/sort/logtape.c              |   2 +-
 src/backend/utils/sort/tuplestore.c           |   2 +-
 src/include/storage/buffile.h                 |  13 +-
 9 files changed, 200 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4c2301da00..9b3b00142a 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index a2e2f86332..f8a3e1f0f4 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2f7170604d..1b5c6448ef 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateTemp(false, true);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a27f51f622..f721447db4 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -54,6 +54,16 @@
 #include "storage/fd.h"
 #include "utils/resowner.h"
 
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+#define NO_LZ4_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method lz4 not supported"), \
+			 errdetail("This functionality requires the server to be built with lz4 support.")))
+
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
  * The reason is that we'd like large BufFiles to be spread across multiple
@@ -62,6 +72,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -95,7 +107,7 @@ struct BufFile
 	off_t		curOffset;		/* offset part of current pos */
 	int			pos;			/* next read/write position in buffer */
 	int			nbytes;			/* total # of valid bytes in buffer */
-
+	bool			compress; /* State of usege file compression */
 	/*
 	 * XXX Should ideally us PGIOAlignedBlock, but might need a way to avoid
 	 * wasting per-file alignment padding when some users create many files.
@@ -127,6 +139,7 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress = false;
 
 	return file;
 }
@@ -190,7 +203,7 @@ extendBufFile(BufFile *file)
  * transaction boundaries.
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,6 +225,15 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+#ifdef USE_LZ4
+		file->compress = compress;
+#else
+		NO_LZ4_SUPPORT();
+#endif
+	}
+
 	return file;
 }
 
@@ -275,6 +297,7 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
 	file->files[0] = MakeNewFileSetSegment(file, 0);
 	file->readOnly = false;
 
+
 	return file;
 }
 
@@ -455,13 +478,72 @@ BufFileLoadBuffer(BufFile *file)
 		INSTR_TIME_SET_ZERO(io_start);
 
 	/*
-	 * Read whatever we can get, up to a full bufferload.
+	 * Load data as it is stored in the temporary file
 	 */
-	file->nbytes = FileRead(thisfile,
+	if (!file->compress)
+	{
+
+		/*
+	 	* Read whatever we can get, up to a full bufferload.
+	 	*/
+		file->nbytes = FileRead(thisfile,
 							file->buffer.data,
 							sizeof(file->buffer),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
+	/*
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
+	 */
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/*
+			 * A long life buffer would make sence to limit number of
+			 * memory allocations
+			 */
+			char * buff;
+
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset+=sizeof(nbytes);
+
+			buff = palloc(nbytes);
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+
+#ifdef USE_LZ4
+			file->nbytes = LZ4_decompress_safe(buff,
+				file->buffer.data,nbytes,sizeof(file->buffer));
+			file->curOffset += nread;
+#endif
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+			pfree(buff);
+		}
+
+	}
+
 	if (file->nbytes < 0)
 	{
 		file->nbytes = 0;
@@ -494,9 +576,56 @@ static void
 BufFileDumpBuffer(BufFile *file)
 {
 	int			wpos = 0;
-	int			bytestowrite;
+	int			bytestowrite = 0;
 	File		thisfile;
 
+
+	/* Save nbytes value because the size changes due to compression */
+	int nbytesOriginal = file->nbytes;
+
+	bool compression = false;
+
+	char * DataToWrite;
+	DataToWrite = file->buffer.data;
+
+	/*
+	 * Prepare compressed data to write
+	 * size of compressed block needs to be added at the beggining of the
+	 * compressed data
+	 */
+
+
+	if (file->compress) {
+		int cBufferSize = 0;
+		char * cData;
+		int cSize;
+#ifdef USE_LZ4
+		cBufferSize = LZ4_compressBound(file->nbytes);
+#endif
+		/*
+		 * A long life buffer would make sence to limit number of
+		 * memory allocations
+		 */
+		compression = true;
+		cData = palloc(cBufferSize + sizeof(int));
+#ifdef USE_LZ4
+		/*
+		 * Using stream compression would lead to the slight improvement in
+		 * compression ratio
+		 */
+		cSize = LZ4_compress_default(file->buffer.data,
+				cData + sizeof(int),file->nbytes, cBufferSize);
+#endif
+
+		/* Write size of compressed block in front of compressed data
+		 * It's used to determine amount of data to read within
+		 * decompression process
+		 */
+		memcpy(cData,&cSize,sizeof(int));
+		file->nbytes=cSize + sizeof(int);
+		DataToWrite = cData;
+	}
+
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
 	 * crosses a component-file boundary; so we need a loop.
@@ -535,7 +664,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +693,19 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+
+
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else
+		if (nbytesOriginal - file->pos != 0)
+			/* curOffset must be corrected also if compression is
+			 * enabled, nbytes was changed by compression but we
+			 * have to use the original value of nbytes
+			 */
+			file->curOffset-=bytestowrite;
+
+
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -577,6 +718,9 @@ BufFileDumpBuffer(BufFile *file)
 	 */
 	file->pos = 0;
 	file->nbytes = 0;
+
+	if (compression)
+		pfree(DataToWrite);
 }
 
 /*
@@ -602,8 +746,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 * */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 686309db58..3821caf763 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -73,6 +73,7 @@
 #include "replication/syncrep.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
+#include "storage/buffile.h"
 #include "storage/large_object.h"
 #include "storage/pg_shmem.h"
 #include "storage/predicate.h"
@@ -454,6 +455,17 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 #endif
 	{NULL, 0, false}
 };
+/*
+ * pglz and zstd support should be added as future enhancement
+ *
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
 
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
@@ -4856,6 +4868,17 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"temp_file_compression", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the default compression method for compressible values."),
+			NULL
+		},
+		&temp_file_compression,
+		TEMP_NONE_COMPRESSION,
+		temp_file_compression_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"default_transaction_isolation", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Sets the transaction isolation level of each new transaction."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 667e0dc40a..e9c0b36352 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -177,6 +177,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'	# enables temporary files compression
 
 # - Kernel Resources -
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 44b30e86ad..af43b3ebb1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index a720d70200..a952f0f4f5 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 5f6d7c8e3f..486b552e31 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,22 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+#ifdef USE_LZ4
+	TEMP_LZ4_COMPRESSION
+#endif
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
 extern void BufFileClose(BufFile *file);
 extern pg_nodiscard size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
-- 
2.46.2

Filip Janus

fjanus@redhat.com

about 1 year ago

In reply to: Filip Janus (#1)

1 attachment(s)

Re: Proposal: Adding compression of temporary files

Let's fix the compiler warning caused by an uninitialized local variable.

-Filip-

čt 14. 11. 2024 v 23:13 odesílatel Filip Janus <fjanus@redhat.com> napsal:

Show quoted text

Hi all,
Postgresql supports data compression nowadays, but the compression of
temporary files has not been implemented yet. The huge queries can
produce a significant amount of temporary data that needs to be stored on
disk
and cause many expensive I/O operations.
I am attaching a proposal of the patch to enable temporary files
compression for
hashjoins for now. Initially, I've chosen the LZ4 compression algorithm.
It would
probably make better sense to start with pglz, but I realized it late.

# Future possible improvements
Reducing the number of memory allocations within the dumping and loading of
the buffer. I have two ideas for solving this problem. I would either add
a buffer into
struct BufFile or provide the buffer as an argument from the caller. For
the sequential
execution, I would prefer the second option.

# Future plan/open questions
In the future, I would like to add support for pglz and zstd. Further, I
plan to
extend the support of the temporary file compression also for sorting,
gist index creation, etc.

Experimenting with the stream mode of compression algorithms. The
compression
ratio of LZ4 in block mode seems to be satisfying, but the stream mode
could
produce a better ratio, but it would consume more memory due to the
requirement to store
context for LZ4 stream compression.

# Benchmark
I prepared three different databases to check expectations. Each
dataset is described below. My testing demonstrates that my patch
improves the execution time of huge hash joins.
Also, my implementation should not
negatively affect performance within smaller queries.
The usage of memory needed for temporary files was reduced in every
execution without a significant impact on execution time.

*## Dataset A:*
Tables
table_a(bigint id,text data_text,integer data_number) - 10000000 rows
table_b(bigint id, integer ref_id, numeric data_value, bytea data_blob) -
10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id = b.id;

The tables contain highly compressible data.
The query demonstrated a reduction in the usage of the temporary
files ~20GB -> 3GB, based on this reduction also caused the execution
time of the query to be reduced by about ~10s.

*## Dataset B:*
Tables:
table_a(integer id, text data_blob) - 1110000 rows
table_b(integer id, text data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id = b.id;

The tables contain less compressible data. data_blob was generated by a
pseudo-random generator.
In this case, the data reduction was only ~50%. Also, the execution time
was reduced
only slightly with the enabled compression.

The second scenario demonstrates no overhead in the case of enabled
compression and extended work_mem to avoid temp file usage.

*## Dataset C:*
Tables
customers (integer,text,text,text,text)
order_items(integer,integer,integer,integer,numeric(10,2))
orders(integer,integer,timestamp,numeric(10,2))
products(integer,text,text,numeric(10,2),integer)

Query: SELECT p.product_id, p.name, p.price, SUM(oi.quantity) AS
total_quantity, AVG(oi.price) AS avg_item_price
FROM eshop.products p JOIN eshop.order_items oi ON p.product_id =
oi.product_id JOIN
eshop.orders o ON oi.order_id = o.order_id WHERE o.order_date >
'2020-01-01' AND p.price > 50
GROUP BY p.product_id, p.name, p.price HAVING SUM(oi.quantity) > 1000
ORDER BY total_quantity DESC LIMIT 100;

This scenario should demonstrate a more realistic usage of the database.
Enabled compression slightly reduced the temporary memory usage, but the
execution
time wasn't affected by compression.
+------------+-------------------------+-----------------------+------------------------------+
|  Dataset   | Compression.       | temp_bytes         | Execution Time
(ms)   |
+------------+-------------------------+-----------------------+-----------------------------
+
| A             | Yes                        |  3.09 GiB            |
22s586ms           | work_mem  = 4MB
|                | No                         |  21.89 GiB          |
35s                       | work_mem  = 4MB
+------------+-------------------------+-----------------------+----------------------------------------
| B | Yes | 333 MB |
1815.545 ms | work_mem = 4MB
| | No | 146 MB |
1500.460 ms | work_mem = 4MB
| | Yes | 0 MB
| 3262.305 ms | work_mem = 80MB
| | No | 0 MB
| 3174.725 ms | work_mem = 80MB

+-------------+------------------------+------------------------+-------------------------------------
| C | Yes | 40 MB
| 1011.020 ms | work_mem = 1MB
| | No | 53 MB |
1034.142 ms | work_mem = 1MB

+------------+------------------------+------------------------+--------------------------------------

Regards,

-Filip-

Attachments:

0001-This-commit-adds-support-for-temporary-files-compres-v2.patchapplication/octet-stream; name=0001-This-commit-adds-support-for-temporary-files-compres-v2.patchDownload

From 790f913391cb9ffd5440202962674addbfb79001 Mon Sep 17 00:00:00 2001
From: Filip <fjanus@redhat.com>
Date: Thu, 24 Oct 2024 12:15:10 +0200
Subject: [PATCH v2] Add support for temporary files compression
This commit adds support for temporary files compression, it can be
used only for hashjoins now.


It also adds GUC parameter temp_file_compression that enables this functionality.
For now, it supports just lz4 algorithms. In the future, it
could also be implemented pglz and zstd support.
---
 src/backend/access/gist/gistbuildbuffers.c    |   2 +-
 src/backend/backup/backup_manifest.c          |   2 +-
 src/backend/executor/nodeHashjoin.c           |   2 +-
 src/backend/storage/file/buffile.c            | 168 +++++++++++++++++-
 src/backend/utils/misc/guc_tables.c           |  23 +++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/sort/logtape.c              |   2 +-
 src/backend/utils/sort/tuplestore.c           |   2 +-
 src/include/storage/buffile.h                 |  13 +-
 9 files changed, 200 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4c2301da00..9b3b00142a 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index a2e2f86332..f8a3e1f0f4 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2f7170604d..1b5c6448ef 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateTemp(false, true);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a27f51f622..6cb6dcc783 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -54,6 +54,16 @@
 #include "storage/fd.h"
 #include "utils/resowner.h"
 
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+#define NO_LZ4_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method lz4 not supported"), \
+			 errdetail("This functionality requires the server to be built with lz4 support.")))
+
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
  * The reason is that we'd like large BufFiles to be spread across multiple
@@ -62,6 +72,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -95,7 +107,7 @@ struct BufFile
 	off_t		curOffset;		/* offset part of current pos */
 	int			pos;			/* next read/write position in buffer */
 	int			nbytes;			/* total # of valid bytes in buffer */
-
+	bool			compress; /* State of usege file compression */
 	/*
 	 * XXX Should ideally us PGIOAlignedBlock, but might need a way to avoid
 	 * wasting per-file alignment padding when some users create many files.
@@ -127,6 +139,7 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress = false;
 
 	return file;
 }
@@ -190,7 +203,7 @@ extendBufFile(BufFile *file)
  * transaction boundaries.
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,6 +225,15 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+#ifdef USE_LZ4
+		file->compress = compress;
+#else
+		NO_LZ4_SUPPORT();
+#endif
+	}
+
 	return file;
 }
 
@@ -275,6 +297,7 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
 	file->files[0] = MakeNewFileSetSegment(file, 0);
 	file->readOnly = false;
 
+
 	return file;
 }
 
@@ -455,13 +478,72 @@ BufFileLoadBuffer(BufFile *file)
 		INSTR_TIME_SET_ZERO(io_start);
 
 	/*
-	 * Read whatever we can get, up to a full bufferload.
+	 * Load data as it is stored in the temporary file
 	 */
-	file->nbytes = FileRead(thisfile,
+	if (!file->compress)
+	{
+
+		/*
+	 	* Read whatever we can get, up to a full bufferload.
+	 	*/
+		file->nbytes = FileRead(thisfile,
 							file->buffer.data,
 							sizeof(file->buffer),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
+	/*
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
+	 */
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/*
+			 * A long life buffer would make sence to limit number of
+			 * memory allocations
+			 */
+			char * buff;
+
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset+=sizeof(nbytes);
+
+			buff = palloc(nbytes);
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+
+#ifdef USE_LZ4
+			file->nbytes = LZ4_decompress_safe(buff,
+				file->buffer.data,nbytes,sizeof(file->buffer));
+			file->curOffset += nread;
+#endif
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+			pfree(buff);
+		}
+
+	}
+
 	if (file->nbytes < 0)
 	{
 		file->nbytes = 0;
@@ -494,9 +576,56 @@ static void
 BufFileDumpBuffer(BufFile *file)
 {
 	int			wpos = 0;
-	int			bytestowrite;
+	int			bytestowrite = 0;
 	File		thisfile;
 
+
+	/* Save nbytes value because the size changes due to compression */
+	int nbytesOriginal = file->nbytes;
+
+	bool compression = false;
+
+	char * DataToWrite;
+	DataToWrite = file->buffer.data;
+
+	/*
+	 * Prepare compressed data to write
+	 * size of compressed block needs to be added at the beggining of the
+	 * compressed data
+	 */
+
+
+	if (file->compress) {
+		int cBufferSize = 0;
+		char * cData;
+		int cSize = 0;
+#ifdef USE_LZ4
+		cBufferSize = LZ4_compressBound(file->nbytes);
+#endif
+		/*
+		 * A long life buffer would make sence to limit number of
+		 * memory allocations
+		 */
+		compression = true;
+		cData = palloc(cBufferSize + sizeof(int));
+#ifdef USE_LZ4
+		/*
+		 * Using stream compression would lead to the slight improvement in
+		 * compression ratio
+		 */
+		cSize = LZ4_compress_default(file->buffer.data,
+				cData + sizeof(int),file->nbytes, cBufferSize);
+#endif
+
+		/* Write size of compressed block in front of compressed data
+		 * It's used to determine amount of data to read within
+		 * decompression process
+		 */
+		memcpy(cData,&cSize,sizeof(int));
+		file->nbytes=cSize + sizeof(int);
+		DataToWrite = cData;
+	}
+
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
 	 * crosses a component-file boundary; so we need a loop.
@@ -535,7 +664,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +693,19 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+
+
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else
+		if (nbytesOriginal - file->pos != 0)
+			/* curOffset must be corrected also if compression is
+			 * enabled, nbytes was changed by compression but we
+			 * have to use the original value of nbytes
+			 */
+			file->curOffset-=bytestowrite;
+
+
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -577,6 +718,9 @@ BufFileDumpBuffer(BufFile *file)
 	 */
 	file->pos = 0;
 	file->nbytes = 0;
+
+	if (compression)
+		pfree(DataToWrite);
 }
 
 /*
@@ -602,8 +746,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 * */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 686309db58..3821caf763 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -73,6 +73,7 @@
 #include "replication/syncrep.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
+#include "storage/buffile.h"
 #include "storage/large_object.h"
 #include "storage/pg_shmem.h"
 #include "storage/predicate.h"
@@ -454,6 +455,17 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 #endif
 	{NULL, 0, false}
 };
+/*
+ * pglz and zstd support should be added as future enhancement
+ *
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
 
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
@@ -4856,6 +4868,17 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"temp_file_compression", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the default compression method for compressible values."),
+			NULL
+		},
+		&temp_file_compression,
+		TEMP_NONE_COMPRESSION,
+		temp_file_compression_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"default_transaction_isolation", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Sets the transaction isolation level of each new transaction."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 667e0dc40a..e9c0b36352 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -177,6 +177,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'	# enables temporary files compression
 
 # - Kernel Resources -
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 44b30e86ad..af43b3ebb1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index a720d70200..a952f0f4f5 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 5f6d7c8e3f..486b552e31 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,22 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+#ifdef USE_LZ4
+	TEMP_LZ4_COMPRESSION
+#endif
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
 extern void BufFileClose(BufFile *file);
 extern pg_nodiscard size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
-- 
2.46.2

Tomas Vondra

tomas@vondra.me

about 1 year ago

In reply to: Filip Janus (#2)

Re: Proposal: Adding compression of temporary files

Hi,

On 11/18/24 22:58, Filip Janus wrote:

...
Hi all,
Postgresql supports data compression nowadays, but the compression of
temporary files has not been implemented yet. The huge queries can
produce a significant amount of temporary data that needs to
be stored on disk
and cause many expensive I/O operations.
I am attaching a proposal of the patch to enable temporary files
compression for
hashjoins for now. Initially, I've chosen the LZ4 compression
algorithm. It would
probably make better sense to start with pglz, but I realized it late.

Thanks for the idea & patch. I agree this might be quite useful for
workloads generating a lot of temporary files for stuff like sorts etc.
I think it will be interesting to think about the trade offs, i.e. how
to pick the compression level - at some point the compression ratio
stops improving while paying more and more CPU time. Not sure what the
right choice is, so using default seems fine.

I agree it'd be better to start with pglz, and only then add lz4 etc.
Firstly, pglz is simply the built-in compression, supported everywhere.
And it's also simpler to implement, I think.

# Future possible improvements
Reducing the number of memory allocations within the dumping and
loading of
the buffer. I have two ideas for solving this problem. I would
either add a buffer into
struct BufFile or provide the buffer as an argument from the caller.
For the sequential
execution, I would prefer the second option.

Yes, this would be good. Doing a palloc+pfree for each compression is
going to be expensive, especially because these buffers are going to be
large - likely larger than 8kB. Which means it's not cached in the
memory context, etc.

Adding it to the BufFile is not going to fly, because that doubles the
amount of memory per file. And we already have major issues with hash
joins consuming massive amounts of memory. But at the same time the
buffer is only needed during compression, and there's only one at a
time. So I agree with passing a single buffer as an argument.

# Future plan/open questions
In the future, I would like to add support for pglz and zstd.
Further, I plan to
extend the support of the temporary file compression also for
sorting, gist index creation, etc.

Experimenting with the stream mode of compression algorithms. The
compression
ratio of LZ4 in block mode seems to be satisfying, but the stream
mode could
produce a better ratio, but it would consume more memory due to the
requirement to store
context for LZ4 stream compression.

One thing I realized is that this only enables temp file compression for
a single place - hash join spill files. AFAIK this is because compressed
files don't support random access, and the other places might need that.

Is that correct? The patch does not explain this anywhere. If that's
correct, the patch probably should mention this in a comment for the
'compress' argument added to BufFileCreateTemp(), so that it's clear
when it's legal to set compress=true.

Which other places might compress temp files? Surely hash joins are not
the only place that could benefit from this, right?

Another thing is testing. If I run regression tests, it won't use
compression at all, because the GUC has "none" by default, right? But we
need some testing, so how would we do that? One option would be to add a
regression test that explicitly sets the GUC and does a hash join, but
that won't work with lz4 (because that may not be enabled).

Another option might be to add a PG_TEST_xxx environment variable that
determines compression to use. Something like PG_TEST_USE_UNIX_SOCKETS.
But perhaps there's a simpler way.

# Benchmark
I prepared three different databases to check expectations. Each
dataset is described below. My testing demonstrates that my patch
improves the execution time of huge hash joins.
Also, my implementation should not
negatively affect performance within smaller queries.
The usage of memory needed for temporary files was reduced in every
execution without a significant impact on execution time.

*## Dataset A:*
Tables*
*
table_a(bigint id,text data_text,integer data_number) - 10000000 rows
table_b(bigint id, integer ref_id, numeric data_value, bytea
data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
a.id> = b.id <http://b.id>;

The tables contain highly compressible data.
The query demonstrated a reduction in the usage of the temporary
files ~20GB -> 3GB, based on this reduction also caused the execution
time of the query to be reduced by about ~10s.

*## Dataset B:*
Tables:*
*
table_a(integer id, text data_blob) - 1110000 rows
table_b(integer id, text data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
a.id> = b.id <http://b.id>;

The tables contain less compressible data. data_blob was generated
by a pseudo-random generator.
In this case, the data reduction was only ~50%. Also, the execution
time was reduced
only slightly with the enabled compression.

The second scenario demonstrates no overhead in the case of enabled
compression and extended work_mem to avoid temp file usage.

*## Dataset C:*
Tables
customers (integer,text,text,text,text)
order_items(integer,integer,integer,integer,numeric(10,2))
orders(integer,integer,timestamp,numeric(10,2))
products(integer,text,text,numeric(10,2),integer)

Query: SELECT p.product_id, p.name <http://p.name>, p.price,
SUM(oi.quantity) AS total_quantity, AVG(oi.price) AS avg_item_price
FROM eshop.products p JOIN eshop.order_items oi ON p.product_id =
oi.product_id JOIN
eshop.orders o ON oi.order_id = o.order_id WHERE o.order_date >
'2020-01-01' AND p.price > 50
GROUP BY p.product_id, p.name <http://p.name>, p.price HAVING
SUM(oi.quantity) > 1000
ORDER BY total_quantity DESC LIMIT 100;

This scenario should demonstrate a more realistic usage of the database.
Enabled compression slightly reduced the temporary memory usage, but
the execution
time wasn't affected by compression.
+------------+-------------------------+-----------------------
+------------------------------+
|  Dataset   | Compression.       | temp_bytes         | Execution
Time (ms)   |      
+------------+-------------------------+-----------------------
+----------------------------- +
| A             | Yes                        |  3.09 GiB           
| 22s586ms           | work_mem  = 4MB
|                | No                         |  21.89 GiB         
| 35s                       | work_mem  = 4MB
+------------+-------------------------+-----------------------
+----------------------------------------
| B             | Yes                        |  333 MB              
| 1815.545 ms       | work_mem = 4MB
|                 | No                        |  146  MB           
  | 1500.460 ms        | work_mem = 4MB
|                 | Yes                       |  0 MB              
    | 3262.305 ms        | work_mem = 80MB
|                 | No                        |  0 MB               
   | 3174.725 ms         | work_mem = 80MB
+-------------+------------------------+------------------------
+-------------------------------------
| C             | Yes                       | 40 MB                 
| 1011.020 ms        | work_mem = 1MB
|                | No                        |  53
MB                 |  1034.142 ms        | work_mem = 1MB
+------------+------------------------+------------------------
+--------------------------------------

Thanks. I'll try to do some benchmarks on my own.

Are these results fro ma single run, or an average of multiple runs? Do
you maybe have a script to reproduce this, including the data generation?

Also, can you share some information about the machine used for this? I
expect the impact to strongly depends on memory pressure - if the temp
file fits into page cache (and stays there), it may not benefit from the
compression, right?

regards

--
Tomas Vondra

Filip Janus

fjanus@redhat.com

about 1 year ago

In reply to: Tomas Vondra (#3)

2 attachment(s)

Re: Proposal: Adding compression of temporary files

-Filip-

st 20. 11. 2024 v 1:35 odesílatel Tomas Vondra <tomas@vondra.me> napsal:

Hi,

On 11/18/24 22:58, Filip Janus wrote:

...
Hi all,
Postgresql supports data compression nowadays, but the compression of
temporary files has not been implemented yet. The huge queries can
produce a significant amount of temporary data that needs to
be stored on disk
and cause many expensive I/O operations.
I am attaching a proposal of the patch to enable temporary files
compression for
hashjoins for now. Initially, I've chosen the LZ4 compression
algorithm. It would
probably make better sense to start with pglz, but I realized it

late.

Thanks for the idea & patch. I agree this might be quite useful for
workloads generating a lot of temporary files for stuff like sorts etc.
I think it will be interesting to think about the trade offs, i.e. how
to pick the compression level - at some point the compression ratio
stops improving while paying more and more CPU time. Not sure what the
right choice is, so using default seems fine.

I agree it'd be better to start with pglz, and only then add lz4 etc.
Firstly, pglz is simply the built-in compression, supported everywhere.
And it's also simpler to implement, I think.

# Future possible improvements
Reducing the number of memory allocations within the dumping and
loading of
the buffer. I have two ideas for solving this problem. I would
either add a buffer into
struct BufFile or provide the buffer as an argument from the caller.
For the sequential
execution, I would prefer the second option.

Yes, this would be good. Doing a palloc+pfree for each compression is
going to be expensive, especially because these buffers are going to be
large - likely larger than 8kB. Which means it's not cached in the
memory context, etc.

Adding it to the BufFile is not going to fly, because that doubles the
amount of memory per file. And we already have major issues with hash
joins consuming massive amounts of memory. But at the same time the
buffer is only needed during compression, and there's only one at a
time. So I agree with passing a single buffer as an argument.

# Future plan/open questions
In the future, I would like to add support for pglz and zstd.
Further, I plan to
extend the support of the temporary file compression also for
sorting, gist index creation, etc.

Experimenting with the stream mode of compression algorithms. The
compression
ratio of LZ4 in block mode seems to be satisfying, but the stream
mode could
produce a better ratio, but it would consume more memory due to the
requirement to store
context for LZ4 stream compression.

One thing I realized is that this only enables temp file compression for
a single place - hash join spill files. AFAIK this is because compressed
files don't support random access, and the other places might need that.

Is that correct? The patch does not explain this anywhere. If that's
correct, the patch probably should mention this in a comment for the
'compress' argument added to BufFileCreateTemp(), so that it's clear
when it's legal to set compress=true.

I will add the description there.

Which other places might compress temp files? Surely hash joins are not
the only place that could benefit from this, right?

Yes, you are definitely right. I have chosen the hash joins as a POC
because
there are no seeks besides seeks at the beginning of the buffer.
I have focused on hashjoins, but there are definitely also other places
where
the compression could be used. I want to add support in other places
in the feature.

Another thing is testing. If I run regression tests, it won't use
compression at all, because the GUC has "none" by default, right? But we
need some testing, so how would we do that? One option would be to add a
regression test that explicitly sets the GUC and does a hash join, but
that won't work with lz4 (because that may not be enabled).

Right, it's "none" by default. My opinion is that we would like to test
every supported compression method, so I will try to add environment
variable as
you recommended.

Another option might be to add a PG_TEST_xxx environment variable that
determines compression to use. Something like PG_TEST_USE_UNIX_SOCKETS.
But perhaps there's a simpler way.

# Benchmark
I prepared three different databases to check expectations. Each
dataset is described below. My testing demonstrates that my patch
improves the execution time of huge hash joins.
Also, my implementation should not
negatively affect performance within smaller queries.
The usage of memory needed for temporary files was reduced in every
execution without a significant impact on execution time.

*## Dataset A:*
Tables*
*
table_a(bigint id,text data_text,integer data_number) - 10000000 rows
table_b(bigint id, integer ref_id, numeric data_value, bytea
data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
a.id> = b.id <http://b.id>;

The tables contain highly compressible data.
The query demonstrated a reduction in the usage of the temporary
files ~20GB -> 3GB, based on this reduction also caused the

execution

time of the query to be reduced by about ~10s.

*## Dataset B:*
Tables:*
*
table_a(integer id, text data_blob) - 1110000 rows
table_b(integer id, text data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
a.id> = b.id <http://b.id>;

The tables contain less compressible data. data_blob was generated
by a pseudo-random generator.
In this case, the data reduction was only ~50%. Also, the execution
time was reduced
only slightly with the enabled compression.

The second scenario demonstrates no overhead in the case of enabled
compression and extended work_mem to avoid temp file usage.

*## Dataset C:*
Tables
customers (integer,text,text,text,text)
order_items(integer,integer,integer,integer,numeric(10,2))
orders(integer,integer,timestamp,numeric(10,2))
products(integer,text,text,numeric(10,2),integer)

Query: SELECT p.product_id, p.name <http://p.name>, p.price,
SUM(oi.quantity) AS total_quantity, AVG(oi.price) AS avg_item_price
FROM eshop.products p JOIN eshop.order_items oi ON p.product_id =
oi.product_id JOIN
eshop.orders o ON oi.order_id = o.order_id WHERE o.order_date >
'2020-01-01' AND p.price > 50
GROUP BY p.product_id, p.name <http://p.name>, p.price HAVING
SUM(oi.quantity) > 1000
ORDER BY total_quantity DESC LIMIT 100;

This scenario should demonstrate a more realistic usage of the

database.
Enabled compression slightly reduced the temporary memory usage, but
the execution
time wasn't affected by compression.
+------------+-------------------------+-----------------------
+------------------------------+
|  Dataset   | Compression.       | temp_bytes         | Execution
Time (ms)   |
+------------+-------------------------+-----------------------
+----------------------------- +
| A             | Yes                        |  3.09 GiB
| 22s586ms           | work_mem  = 4MB
|                | No                         |  21.89 GiB
| 35s                       | work_mem  = 4MB
+------------+-------------------------+-----------------------
+----------------------------------------
| B             | Yes                        |  333 MB
| 1815.545 ms       | work_mem = 4MB
|                 | No                        |  146  MB
| 1500.460 ms        | work_mem = 4MB
|                 | Yes                       |  0 MB
| 3262.305 ms        | work_mem = 80MB
|                 | No                        |  0 MB
| 3174.725 ms         | work_mem = 80MB
+-------------+------------------------+------------------------
+-------------------------------------
| C             | Yes                       | 40 MB
| 1011.020 ms        | work_mem = 1MB
|                | No                        |  53
MB                 |  1034.142 ms        | work_mem = 1MB
+------------+------------------------+------------------------
+--------------------------------------
Thanks. I'll try to do some benchmarks on my own.

Are these results fro ma single run, or an average of multiple runs?

It is average from multiple runs.

you maybe have a script to reproduce this, including the data generation?

I am attaching my SQL file for database preparation. I also did further
testing
with two other machines( see attachment huge_tables.rtf ).

Also, can you share some information about the machine used for this? I
expect the impact to strongly depends on memory pressure - if the temp
file fits into page cache (and stays there), it may not benefit from the
compression, right?

If it fits into the page cache due to compression, I would consider it as a
benefit from compression.
I performed further testing on machines with different memory sizes.
Both experiments showed that compression was beneficial for execution time.
The execution time reduction was more significant in the case of the
machine that had
less memory available.

Tests were performed on:
MacBook PRO M3 36GB - MacOs
Virtual machine ARM64 10GB/ 6CPU - Fedora 39

Show quoted text

regards

--
Tomas Vondra

Filip Janus

fjanus@redhat.com

about 1 year ago

In reply to: Filip Janus (#4)

1 attachment(s)

Re: Proposal: Adding compression of temporary files

I've added a regression test for lz4 compression if the server is compiled
with the "--with-lz4" option.

-Filip-

ne 24. 11. 2024 v 15:53 odesílatel Filip Janus <fjanus@redhat.com> napsal:

Show quoted text

-Filip-

st 20. 11. 2024 v 1:35 odesílatel Tomas Vondra <tomas@vondra.me> napsal:

Hi,

On 11/18/24 22:58, Filip Janus wrote:

...
Hi all,
Postgresql supports data compression nowadays, but the compression

of

temporary files has not been implemented yet. The huge queries can
produce a significant amount of temporary data that needs to
be stored on disk
and cause many expensive I/O operations.
I am attaching a proposal of the patch to enable temporary files
compression for
hashjoins for now. Initially, I've chosen the LZ4 compression
algorithm. It would
probably make better sense to start with pglz, but I realized it

late.

Thanks for the idea & patch. I agree this might be quite useful for
workloads generating a lot of temporary files for stuff like sorts etc.
I think it will be interesting to think about the trade offs, i.e. how
to pick the compression level - at some point the compression ratio
stops improving while paying more and more CPU time. Not sure what the
right choice is, so using default seems fine.

I agree it'd be better to start with pglz, and only then add lz4 etc.
Firstly, pglz is simply the built-in compression, supported everywhere.
And it's also simpler to implement, I think.

# Future possible improvements
Reducing the number of memory allocations within the dumping and
loading of
the buffer. I have two ideas for solving this problem. I would
either add a buffer into
struct BufFile or provide the buffer as an argument from the caller.
For the sequential
execution, I would prefer the second option.

Yes, this would be good. Doing a palloc+pfree for each compression is
going to be expensive, especially because these buffers are going to be
large - likely larger than 8kB. Which means it's not cached in the
memory context, etc.

Adding it to the BufFile is not going to fly, because that doubles the
amount of memory per file. And we already have major issues with hash
joins consuming massive amounts of memory. But at the same time the
buffer is only needed during compression, and there's only one at a
time. So I agree with passing a single buffer as an argument.

# Future plan/open questions
In the future, I would like to add support for pglz and zstd.
Further, I plan to
extend the support of the temporary file compression also for
sorting, gist index creation, etc.

Experimenting with the stream mode of compression algorithms. The
compression
ratio of LZ4 in block mode seems to be satisfying, but the stream
mode could
produce a better ratio, but it would consume more memory due to the
requirement to store
context for LZ4 stream compression.

One thing I realized is that this only enables temp file compression for
a single place - hash join spill files. AFAIK this is because compressed
files don't support random access, and the other places might need that.

Is that correct? The patch does not explain this anywhere. If that's
correct, the patch probably should mention this in a comment for the
'compress' argument added to BufFileCreateTemp(), so that it's clear
when it's legal to set compress=true.

I will add the description there.

Which other places might compress temp files? Surely hash joins are not
the only place that could benefit from this, right?

Yes, you are definitely right. I have chosen the hash joins as a POC
because
there are no seeks besides seeks at the beginning of the buffer.
I have focused on hashjoins, but there are definitely also other places
where
the compression could be used. I want to add support in other places
in the feature.

Another thing is testing. If I run regression tests, it won't use
compression at all, because the GUC has "none" by default, right? But we
need some testing, so how would we do that? One option would be to add a
regression test that explicitly sets the GUC and does a hash join, but
that won't work with lz4 (because that may not be enabled).

Right, it's "none" by default. My opinion is that we would like to test
every supported compression method, so I will try to add environment
variable as
you recommended.
Another option might be to add a PG_TEST_xxx environment variable that
determines compression to use. Something like PG_TEST_USE_UNIX_SOCKETS.
But perhaps there's a simpler way.

# Benchmark
I prepared three different databases to check expectations. Each
dataset is described below. My testing demonstrates that my patch
improves the execution time of huge hash joins.
Also, my implementation should not
negatively affect performance within smaller queries.
The usage of memory needed for temporary files was reduced in every
execution without a significant impact on execution time.

*## Dataset A:*
Tables*
*
table_a(bigint id,text data_text,integer data_number) - 10000000

rows

table_b(bigint id, integer ref_id, numeric data_value, bytea
data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
a.id> = b.id <http://b.id>;

The tables contain highly compressible data.
The query demonstrated a reduction in the usage of the temporary
files ~20GB -> 3GB, based on this reduction also caused the

execution

time of the query to be reduced by about ~10s.

*## Dataset B:*
Tables:*
*
table_a(integer id, text data_blob) - 1110000 rows
table_b(integer id, text data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
a.id> = b.id <http://b.id>;

The tables contain less compressible data. data_blob was generated
by a pseudo-random generator.
In this case, the data reduction was only ~50%. Also, the execution
time was reduced
only slightly with the enabled compression.

The second scenario demonstrates no overhead in the case of enabled
compression and extended work_mem to avoid temp file usage.

*## Dataset C:*
Tables
customers (integer,text,text,text,text)
order_items(integer,integer,integer,integer,numeric(10,2))
orders(integer,integer,timestamp,numeric(10,2))
products(integer,text,text,numeric(10,2),integer)

Query: SELECT p.product_id, p.name <http://p.name>, p.price,
SUM(oi.quantity) AS total_quantity, AVG(oi.price) AS avg_item_price
FROM eshop.products p JOIN eshop.order_items oi ON p.product_id =
oi.product_id JOIN
eshop.orders o ON oi.order_id = o.order_id WHERE o.order_date >
'2020-01-01' AND p.price > 50
GROUP BY p.product_id, p.name <http://p.name>, p.price HAVING
SUM(oi.quantity) > 1000
ORDER BY total_quantity DESC LIMIT 100;

This scenario should demonstrate a more realistic usage of the

database.
Enabled compression slightly reduced the temporary memory usage, but
the execution
time wasn't affected by compression.
+------------+-------------------------+-----------------------
+------------------------------+
|  Dataset   | Compression.       | temp_bytes         | Execution
Time (ms)   |
+------------+-------------------------+-----------------------
+----------------------------- +
| A             | Yes                        |  3.09 GiB
| 22s586ms           | work_mem  = 4MB
|                | No                         |  21.89 GiB
| 35s                       | work_mem  = 4MB
+------------+-------------------------+-----------------------
+----------------------------------------
| B             | Yes                        |  333 MB
| 1815.545 ms       | work_mem = 4MB
|                 | No                        |  146  MB
| 1500.460 ms        | work_mem = 4MB
|                 | Yes                       |  0 MB
| 3262.305 ms        | work_mem = 80MB
|                 | No                        |  0 MB
| 3174.725 ms         | work_mem = 80MB
+-------------+------------------------+------------------------
+-------------------------------------
| C             | Yes                       | 40 MB
| 1011.020 ms        | work_mem = 1MB
|                | No                        |  53
MB                 |  1034.142 ms        | work_mem = 1MB
+------------+------------------------+------------------------
+--------------------------------------
Thanks. I'll try to do some benchmarks on my own.

Are these results fro ma single run, or an average of multiple runs?
It is average from multiple runs.

Do

you maybe have a script to reproduce this, including the data generation?

I am attaching my SQL file for database preparation. I also did further
testing
with two other machines( see attachment huge_tables.rtf ).

Also, can you share some information about the machine used for this? I
expect the impact to strongly depends on memory pressure - if the temp
file fits into page cache (and stays there), it may not benefit from the
compression, right?

If it fits into the page cache due to compression, I would consider it as
a benefit from compression.
I performed further testing on machines with different memory sizes.
Both experiments showed that compression was beneficial for execution
time.
The execution time reduction was more significant in the case of the
machine that had
less memory available.

Tests were performed on:
MacBook PRO M3 36GB - MacOs
Virtual machine ARM64 10GB/ 6CPU - Fedora 39

regards

--
Tomas Vondra

Attachments:

0001-This-commit-adds-support-for-temporary-files-compres-v3.patchapplication/octet-stream; name=0001-This-commit-adds-support-for-temporary-files-compres-v3.patchDownload

From fd4e42c830bdf8231ac7b6ae21326c38baacdc34 Mon Sep 17 00:00:00 2001
From: Filip <fjanus@redhat.com>
Date: Thu, 24 Oct 2024 12:15:10 +0200
Subject: [PATCH] This commit adds support for temporary files compression, it
 can be used only for hashjoins now.

It also adds GUC parameter temp_file_compression that enables this functionality.
For now, it supports just lz4 algorithms. In the future, it
could also be implemented pglz and zstd support.
---
 src/Makefile.global.in                        |    1 +
 src/backend/access/gist/gistbuildbuffers.c    |    2 +-
 src/backend/backup/backup_manifest.c          |    2 +-
 src/backend/executor/nodeHashjoin.c           |    2 +-
 src/backend/storage/file/buffile.c            |  176 ++-
 src/backend/utils/misc/guc_tables.c           |   23 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/backend/utils/sort/logtape.c              |    2 +-
 src/backend/utils/sort/tuplestore.c           |    2 +-
 src/include/storage/buffile.h                 |   13 +-
 src/test/regress/GNUmakefile                  |    4 +
 src/test/regress/expected/join_hash_lz4.out   | 1166 +++++++++++++++++
 src/test/regress/sql/join_hash_lz4.sql        |  626 +++++++++
 14 files changed, 2006 insertions(+), 16 deletions(-)
 create mode 100644 src/test/regress/expected/join_hash_lz4.out
 create mode 100644 src/test/regress/sql/join_hash_lz4.sql

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 42f50b4976..06e701fe9c 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -193,6 +193,7 @@ with_ldap	= @with_ldap@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
 with_llvm	= @with_llvm@
+with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4c2301da00..9b3b00142a 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index a2e2f86332..f8a3e1f0f4 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2f7170604d..1b5c6448ef 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateTemp(false, true);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a27f51f622..818ef39d5c 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -54,6 +54,16 @@
 #include "storage/fd.h"
 #include "utils/resowner.h"
 
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+#define NO_LZ4_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method lz4 not supported"), \
+			 errdetail("This functionality requires the server to be built with lz4 support.")))
+
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
  * The reason is that we'd like large BufFiles to be spread across multiple
@@ -62,6 +72,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -95,7 +107,7 @@ struct BufFile
 	off_t		curOffset;		/* offset part of current pos */
 	int			pos;			/* next read/write position in buffer */
 	int			nbytes;			/* total # of valid bytes in buffer */
-
+	bool			compress; /* State of usege file compression */
 	/*
 	 * XXX Should ideally us PGIOAlignedBlock, but might need a way to avoid
 	 * wasting per-file alignment padding when some users create many files.
@@ -127,6 +139,7 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress = false;
 
 	return file;
 }
@@ -188,9 +201,17 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
+ *
+ * If compress is true the temporary files will be compressed before
+ * writing on disk.
+ *
+ * Note: The compression does not support random access. Only the hash joins
+ * use it for now. The seek operation other than seek to the beginning of the
+ * buffile will corrupt temporary data offsets.
+ *
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,6 +233,15 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+#ifdef USE_LZ4
+		file->compress = compress;
+#else
+		NO_LZ4_SUPPORT();
+#endif
+	}
+
 	return file;
 }
 
@@ -275,6 +305,7 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
 	file->files[0] = MakeNewFileSetSegment(file, 0);
 	file->readOnly = false;
 
+
 	return file;
 }
 
@@ -455,13 +486,72 @@ BufFileLoadBuffer(BufFile *file)
 		INSTR_TIME_SET_ZERO(io_start);
 
 	/*
-	 * Read whatever we can get, up to a full bufferload.
+	 * Load data as it is stored in the temporary file
 	 */
-	file->nbytes = FileRead(thisfile,
+	if (!file->compress)
+	{
+
+		/*
+	 	* Read whatever we can get, up to a full bufferload.
+	 	*/
+		file->nbytes = FileRead(thisfile,
 							file->buffer.data,
 							sizeof(file->buffer),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
+	/*
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
+	 */
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/*
+			 * A long life buffer would make sence to limit number of
+			 * memory allocations
+			 */
+			char * buff;
+
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset+=sizeof(nbytes);
+
+			buff = palloc(nbytes);
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+
+#ifdef USE_LZ4
+			file->nbytes = LZ4_decompress_safe(buff,
+				file->buffer.data,nbytes,sizeof(file->buffer));
+			file->curOffset += nread;
+#endif
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+			pfree(buff);
+		}
+
+	}
+
 	if (file->nbytes < 0)
 	{
 		file->nbytes = 0;
@@ -494,9 +584,56 @@ static void
 BufFileDumpBuffer(BufFile *file)
 {
 	int			wpos = 0;
-	int			bytestowrite;
+	int			bytestowrite = 0;
 	File		thisfile;
 
+
+	/* Save nbytes value because the size changes due to compression */
+	int nbytesOriginal = file->nbytes;
+
+	bool compression = false;
+
+	char * DataToWrite;
+	DataToWrite = file->buffer.data;
+
+	/*
+	 * Prepare compressed data to write
+	 * size of compressed block needs to be added at the beggining of the
+	 * compressed data
+	 */
+
+
+	if (file->compress) {
+		int cBufferSize = 0;
+		char * cData;
+		int cSize = 0;
+#ifdef USE_LZ4
+		cBufferSize = LZ4_compressBound(file->nbytes);
+#endif
+		/*
+		 * A long life buffer would make sence to limit number of
+		 * memory allocations
+		 */
+		compression = true;
+		cData = palloc(cBufferSize + sizeof(int));
+#ifdef USE_LZ4
+		/*
+		 * Using stream compression would lead to the slight improvement in
+		 * compression ratio
+		 */
+		cSize = LZ4_compress_default(file->buffer.data,
+				cData + sizeof(int),file->nbytes, cBufferSize);
+#endif
+
+		/* Write size of compressed block in front of compressed data
+		 * It's used to determine amount of data to read within
+		 * decompression process
+		 */
+		memcpy(cData,&cSize,sizeof(int));
+		file->nbytes=cSize + sizeof(int);
+		DataToWrite = cData;
+	}
+
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
 	 * crosses a component-file boundary; so we need a loop.
@@ -535,7 +672,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +701,19 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+
+
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else
+		if (nbytesOriginal - file->pos != 0)
+			/* curOffset must be corrected also if compression is
+			 * enabled, nbytes was changed by compression but we
+			 * have to use the original value of nbytes
+			 */
+			file->curOffset-=bytestowrite;
+
+
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -577,6 +726,9 @@ BufFileDumpBuffer(BufFile *file)
 	 */
 	file->pos = 0;
 	file->nbytes = 0;
+
+	if (compression)
+		pfree(DataToWrite);
 }
 
 /*
@@ -602,8 +754,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 * */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 686309db58..3821caf763 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -73,6 +73,7 @@
 #include "replication/syncrep.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
+#include "storage/buffile.h"
 #include "storage/large_object.h"
 #include "storage/pg_shmem.h"
 #include "storage/predicate.h"
@@ -454,6 +455,17 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 #endif
 	{NULL, 0, false}
 };
+/*
+ * pglz and zstd support should be added as future enhancement
+ *
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
 
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
@@ -4856,6 +4868,17 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"temp_file_compression", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the default compression method for compressible values."),
+			NULL
+		},
+		&temp_file_compression,
+		TEMP_NONE_COMPRESSION,
+		temp_file_compression_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"default_transaction_isolation", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Sets the transaction isolation level of each new transaction."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 667e0dc40a..e9c0b36352 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -177,6 +177,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'	# enables temporary files compression
 
 # - Kernel Resources -
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 44b30e86ad..af43b3ebb1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index a720d70200..a952f0f4f5 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 5f6d7c8e3f..486b552e31 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,22 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+#ifdef USE_LZ4
+	TEMP_LZ4_COMPRESSION
+#endif
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
 extern void BufFileClose(BufFile *file);
 extern pg_nodiscard size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index 9003435aab..859eb79bd7 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -114,6 +114,10 @@ submake-contrib-spi: | submake-libpgport submake-generated-headers
 REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 \
 	$(EXTRA_REGRESS_OPTS)
 
+ifeq ($(with_lz4),yes)
+override EXTRA_TESTS := join_hash_lz4 $(EXTRA_TESTS)
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/expected/join_hash_lz4.out b/src/test/regress/expected/join_hash_lz4.out
new file mode 100644
index 0000000000..966a5cd8f5
--- /dev/null
+++ b/src/test/regress/expected/join_hash_lz4.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/sql/join_hash_lz4.sql b/src/test/regress/sql/join_hash_lz4.sql
new file mode 100644
index 0000000000..1d19c1980e
--- /dev/null
+++ b/src/test/regress/sql/join_hash_lz4.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.39.5 (Apple Git-154)

Filip Janus

fjanus@redhat.com

about 1 year ago

In reply to: Filip Janus (#5)

3 attachment(s)

Re: Proposal: Adding compression of temporary files

Even though i started with lz4, I added also pglz support and enhanced
memory management based on provided review.

-Filip-

čt 28. 11. 2024 v 12:32 odesílatel Filip Janus <fjanus@redhat.com> napsal:

Show quoted text

I've added a regression test for lz4 compression if the server is compiled
with the "--with-lz4" option.

-Filip-

ne 24. 11. 2024 v 15:53 odesílatel Filip Janus <fjanus@redhat.com> napsal:
-Filip-

st 20. 11. 2024 v 1:35 odesílatel Tomas Vondra <tomas@vondra.me> napsal:

Hi,

On 11/18/24 22:58, Filip Janus wrote:

...
Hi all,
Postgresql supports data compression nowadays, but the compression

of

temporary files has not been implemented yet. The huge queries can
produce a significant amount of temporary data that needs to
be stored on disk
and cause many expensive I/O operations.
I am attaching a proposal of the patch to enable temporary files
compression for
hashjoins for now. Initially, I've chosen the LZ4 compression
algorithm. It would
probably make better sense to start with pglz, but I realized it

late.

Thanks for the idea & patch. I agree this might be quite useful for
workloads generating a lot of temporary files for stuff like sorts etc.
I think it will be interesting to think about the trade offs, i.e. how
to pick the compression level - at some point the compression ratio
stops improving while paying more and more CPU time. Not sure what the
right choice is, so using default seems fine.

I agree it'd be better to start with pglz, and only then add lz4 etc.
Firstly, pglz is simply the built-in compression, supported everywhere.
And it's also simpler to implement, I think.

# Future possible improvements
Reducing the number of memory allocations within the dumping and
loading of
the buffer. I have two ideas for solving this problem. I would
either add a buffer into
struct BufFile or provide the buffer as an argument from the

caller.

For the sequential
execution, I would prefer the second option.

Yes, this would be good. Doing a palloc+pfree for each compression is
going to be expensive, especially because these buffers are going to be
large - likely larger than 8kB. Which means it's not cached in the
memory context, etc.

Adding it to the BufFile is not going to fly, because that doubles the
amount of memory per file. And we already have major issues with hash
joins consuming massive amounts of memory. But at the same time the
buffer is only needed during compression, and there's only one at a
time. So I agree with passing a single buffer as an argument.

# Future plan/open questions
In the future, I would like to add support for pglz and zstd.
Further, I plan to
extend the support of the temporary file compression also for
sorting, gist index creation, etc.

Experimenting with the stream mode of compression algorithms. The
compression
ratio of LZ4 in block mode seems to be satisfying, but the stream
mode could
produce a better ratio, but it would consume more memory due to the
requirement to store
context for LZ4 stream compression.

One thing I realized is that this only enables temp file compression for
a single place - hash join spill files. AFAIK this is because compressed
files don't support random access, and the other places might need that.

Is that correct? The patch does not explain this anywhere. If that's
correct, the patch probably should mention this in a comment for the
'compress' argument added to BufFileCreateTemp(), so that it's clear
when it's legal to set compress=true.

I will add the description there.

Which other places might compress temp files? Surely hash joins are not
the only place that could benefit from this, right?

Yes, you are definitely right. I have chosen the hash joins as a POC
because
there are no seeks besides seeks at the beginning of the buffer.
I have focused on hashjoins, but there are definitely also other places
where
the compression could be used. I want to add support in other places
in the feature.

Another thing is testing. If I run regression tests, it won't use
compression at all, because the GUC has "none" by default, right? But we
need some testing, so how would we do that? One option would be to add a
regression test that explicitly sets the GUC and does a hash join, but
that won't work with lz4 (because that may not be enabled).

Right, it's "none" by default. My opinion is that we would like to test
every supported compression method, so I will try to add environment
variable as
you recommended.
Another option might be to add a PG_TEST_xxx environment variable that
determines compression to use. Something like PG_TEST_USE_UNIX_SOCKETS.
But perhaps there's a simpler way.

# Benchmark
I prepared three different databases to check expectations. Each
dataset is described below. My testing demonstrates that my patch
improves the execution time of huge hash joins.
Also, my implementation should not
negatively affect performance within smaller queries.
The usage of memory needed for temporary files was reduced in every
execution without a significant impact on execution time.

*## Dataset A:*
Tables*
*
table_a(bigint id,text data_text,integer data_number) - 10000000

rows

table_b(bigint id, integer ref_id, numeric data_value, bytea
data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
a.id> = b.id <http://b.id>;

The tables contain highly compressible data.
The query demonstrated a reduction in the usage of the temporary
files ~20GB -> 3GB, based on this reduction also caused the

execution

time of the query to be reduced by about ~10s.

*## Dataset B:*
Tables:*
*
table_a(integer id, text data_blob) - 1110000 rows
table_b(integer id, text data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
a.id> = b.id <http://b.id>;

The tables contain less compressible data. data_blob was generated
by a pseudo-random generator.
In this case, the data reduction was only ~50%. Also, the execution
time was reduced
only slightly with the enabled compression.

The second scenario demonstrates no overhead in the case of

enabled

compression and extended work_mem to avoid temp file usage.

*## Dataset C:*
Tables
customers (integer,text,text,text,text)
order_items(integer,integer,integer,integer,numeric(10,2))
orders(integer,integer,timestamp,numeric(10,2))
products(integer,text,text,numeric(10,2),integer)

Query: SELECT p.product_id, p.name <http://p.name>, p.price,
SUM(oi.quantity) AS total_quantity, AVG(oi.price) AS avg_item_price
FROM eshop.products p JOIN eshop.order_items oi ON p.product_id =
oi.product_id JOIN
eshop.orders o ON oi.order_id = o.order_id WHERE o.order_date >
'2020-01-01' AND p.price > 50
GROUP BY p.product_id, p.name <http://p.name>, p.price HAVING
SUM(oi.quantity) > 1000
ORDER BY total_quantity DESC LIMIT 100;

This scenario should demonstrate a more realistic usage of the

database.

Enabled compression slightly reduced the temporary memory usage,

but
the execution
time wasn't affected by compression.
+------------+-------------------------+-----------------------
+------------------------------+
|  Dataset   | Compression.       | temp_bytes         | Execution
Time (ms)   |
+------------+-------------------------+-----------------------
+----------------------------- +
| A             | Yes                        |  3.09 GiB
| 22s586ms           | work_mem  = 4MB
|                | No                         |  21.89 GiB
| 35s                       | work_mem  = 4MB
+------------+-------------------------+-----------------------
+----------------------------------------
| B             | Yes                        |  333 MB
| 1815.545 ms | work_mem = 4MB
| | No | 146 MB
| 1500.460 ms | work_mem = 4MB
| | Yes | 0 MB
| 3262.305 ms | work_mem = 80MB
| | No | 0 MB
| 3174.725 ms         | work_mem = 80MB
+-------------+------------------------+------------------------
+-------------------------------------
| C             | Yes                       | 40
MB
| 1011.020 ms        | work_mem = 1MB
|                | No                        |  53
MB                 |  1034.142 ms        | work_mem = 1MB
+------------+------------------------+------------------------
+--------------------------------------
Thanks. I'll try to do some benchmarks on my own.

Are these results fro ma single run, or an average of multiple runs?
It is average from multiple runs.

Do

you maybe have a script to reproduce this, including the data generation?

I am attaching my SQL file for database preparation. I also did further
testing
with two other machines( see attachment huge_tables.rtf ).

Also, can you share some information about the machine used for this? I
expect the impact to strongly depends on memory pressure - if the temp
file fits into page cache (and stays there), it may not benefit from the
compression, right?

If it fits into the page cache due to compression, I would consider it as
a benefit from compression.
I performed further testing on machines with different memory sizes.
Both experiments showed that compression was beneficial for execution
time.
The execution time reduction was more significant in the case of the
machine that had
less memory available.

Tests were performed on:
MacBook PRO M3 36GB - MacOs
Virtual machine ARM64 10GB/ 6CPU - Fedora 39

regards

--
Tomas Vondra

Attachments:

0001-This-commit-adds-support-for-temporary-files-compres.patchapplication/octet-stream; name=0001-This-commit-adds-support-for-temporary-files-compres.patchDownload

From fd4e42c830bdf8231ac7b6ae21326c38baacdc34 Mon Sep 17 00:00:00 2001
From: Filip <fjanus@redhat.com>
Date: Thu, 24 Oct 2024 12:15:10 +0200
Subject: [PATCH 1/3] This commit adds support for temporary files compression,
 it can be used only for hashjoins now.

It also adds GUC parameter temp_file_compression that enables this functionality.
For now, it supports just lz4 algorithms. In the future, it
could also be implemented pglz and zstd support.
---
 src/Makefile.global.in                        |    1 +
 src/backend/access/gist/gistbuildbuffers.c    |    2 +-
 src/backend/backup/backup_manifest.c          |    2 +-
 src/backend/executor/nodeHashjoin.c           |    2 +-
 src/backend/storage/file/buffile.c            |  176 ++-
 src/backend/utils/misc/guc_tables.c           |   23 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/backend/utils/sort/logtape.c              |    2 +-
 src/backend/utils/sort/tuplestore.c           |    2 +-
 src/include/storage/buffile.h                 |   13 +-
 src/test/regress/GNUmakefile                  |    4 +
 src/test/regress/expected/join_hash_lz4.out   | 1166 +++++++++++++++++
 src/test/regress/expected/jsonb_jsonpath.out  |    2 +-
 src/test/regress/sql/join_hash_lz4.sql        |  626 +++++++++
 14 files changed, 2006 insertions(+), 16 deletions(-)
 create mode 100644 src/test/regress/expected/join_hash_lz4.out
 create mode 100644 src/test/regress/sql/join_hash_lz4.sql

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 42f50b4976..06e701fe9c 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -193,6 +193,7 @@ with_ldap	= @with_ldap@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
 with_llvm	= @with_llvm@
+with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4c2301da00..9b3b00142a 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index a2e2f86332..f8a3e1f0f4 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2f7170604d..1b5c6448ef 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateTemp(false, true);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a27f51f622..818ef39d5c 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -54,6 +54,16 @@
 #include "storage/fd.h"
 #include "utils/resowner.h"
 
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+#define NO_LZ4_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method lz4 not supported"), \
+			 errdetail("This functionality requires the server to be built with lz4 support.")))
+
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
  * The reason is that we'd like large BufFiles to be spread across multiple
@@ -62,6 +72,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -95,7 +107,7 @@ struct BufFile
 	off_t		curOffset;		/* offset part of current pos */
 	int			pos;			/* next read/write position in buffer */
 	int			nbytes;			/* total # of valid bytes in buffer */
-
+	bool			compress; /* State of usege file compression */
 	/*
 	 * XXX Should ideally us PGIOAlignedBlock, but might need a way to avoid
 	 * wasting per-file alignment padding when some users create many files.
@@ -127,6 +139,7 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress = false;
 
 	return file;
 }
@@ -188,9 +201,17 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
+ *
+ * If compress is true the temporary files will be compressed before
+ * writing on disk.
+ *
+ * Note: The compression does not support random access. Only the hash joins
+ * use it for now. The seek operation other than seek to the beginning of the
+ * buffile will corrupt temporary data offsets.
+ *
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,6 +233,15 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+#ifdef USE_LZ4
+		file->compress = compress;
+#else
+		NO_LZ4_SUPPORT();
+#endif
+	}
+
 	return file;
 }
 
@@ -275,6 +305,7 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
 	file->files[0] = MakeNewFileSetSegment(file, 0);
 	file->readOnly = false;
 
+
 	return file;
 }
 
@@ -455,13 +486,72 @@ BufFileLoadBuffer(BufFile *file)
 		INSTR_TIME_SET_ZERO(io_start);
 
 	/*
-	 * Read whatever we can get, up to a full bufferload.
+	 * Load data as it is stored in the temporary file
 	 */
-	file->nbytes = FileRead(thisfile,
+	if (!file->compress)
+	{
+
+		/*
+	 	* Read whatever we can get, up to a full bufferload.
+	 	*/
+		file->nbytes = FileRead(thisfile,
 							file->buffer.data,
 							sizeof(file->buffer),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
+	/*
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
+	 */
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/*
+			 * A long life buffer would make sence to limit number of
+			 * memory allocations
+			 */
+			char * buff;
+
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset+=sizeof(nbytes);
+
+			buff = palloc(nbytes);
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+
+#ifdef USE_LZ4
+			file->nbytes = LZ4_decompress_safe(buff,
+				file->buffer.data,nbytes,sizeof(file->buffer));
+			file->curOffset += nread;
+#endif
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+			pfree(buff);
+		}
+
+	}
+
 	if (file->nbytes < 0)
 	{
 		file->nbytes = 0;
@@ -494,9 +584,56 @@ static void
 BufFileDumpBuffer(BufFile *file)
 {
 	int			wpos = 0;
-	int			bytestowrite;
+	int			bytestowrite = 0;
 	File		thisfile;
 
+
+	/* Save nbytes value because the size changes due to compression */
+	int nbytesOriginal = file->nbytes;
+
+	bool compression = false;
+
+	char * DataToWrite;
+	DataToWrite = file->buffer.data;
+
+	/*
+	 * Prepare compressed data to write
+	 * size of compressed block needs to be added at the beggining of the
+	 * compressed data
+	 */
+
+
+	if (file->compress) {
+		int cBufferSize = 0;
+		char * cData;
+		int cSize = 0;
+#ifdef USE_LZ4
+		cBufferSize = LZ4_compressBound(file->nbytes);
+#endif
+		/*
+		 * A long life buffer would make sence to limit number of
+		 * memory allocations
+		 */
+		compression = true;
+		cData = palloc(cBufferSize + sizeof(int));
+#ifdef USE_LZ4
+		/*
+		 * Using stream compression would lead to the slight improvement in
+		 * compression ratio
+		 */
+		cSize = LZ4_compress_default(file->buffer.data,
+				cData + sizeof(int),file->nbytes, cBufferSize);
+#endif
+
+		/* Write size of compressed block in front of compressed data
+		 * It's used to determine amount of data to read within
+		 * decompression process
+		 */
+		memcpy(cData,&cSize,sizeof(int));
+		file->nbytes=cSize + sizeof(int);
+		DataToWrite = cData;
+	}
+
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
 	 * crosses a component-file boundary; so we need a loop.
@@ -535,7 +672,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +701,19 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+
+
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else
+		if (nbytesOriginal - file->pos != 0)
+			/* curOffset must be corrected also if compression is
+			 * enabled, nbytes was changed by compression but we
+			 * have to use the original value of nbytes
+			 */
+			file->curOffset-=bytestowrite;
+
+
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -577,6 +726,9 @@ BufFileDumpBuffer(BufFile *file)
 	 */
 	file->pos = 0;
 	file->nbytes = 0;
+
+	if (compression)
+		pfree(DataToWrite);
 }
 
 /*
@@ -602,8 +754,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 * */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 686309db58..3821caf763 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -73,6 +73,7 @@
 #include "replication/syncrep.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
+#include "storage/buffile.h"
 #include "storage/large_object.h"
 #include "storage/pg_shmem.h"
 #include "storage/predicate.h"
@@ -454,6 +455,17 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 #endif
 	{NULL, 0, false}
 };
+/*
+ * pglz and zstd support should be added as future enhancement
+ *
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
 
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
@@ -4856,6 +4868,17 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"temp_file_compression", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the default compression method for compressible values."),
+			NULL
+		},
+		&temp_file_compression,
+		TEMP_NONE_COMPRESSION,
+		temp_file_compression_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"default_transaction_isolation", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Sets the transaction isolation level of each new transaction."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 667e0dc40a..e9c0b36352 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -177,6 +177,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'	# enables temporary files compression
 
 # - Kernel Resources -
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 44b30e86ad..af43b3ebb1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index a720d70200..a952f0f4f5 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 5f6d7c8e3f..486b552e31 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,22 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+#ifdef USE_LZ4
+	TEMP_LZ4_COMPRESSION
+#endif
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
 extern void BufFileClose(BufFile *file);
 extern pg_nodiscard size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index 9003435aab..859eb79bd7 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -114,6 +114,10 @@ submake-contrib-spi: | submake-libpgport submake-generated-headers
 REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 \
 	$(EXTRA_REGRESS_OPTS)
 
+ifeq ($(with_lz4),yes)
+override EXTRA_TESTS := join_hash_lz4 $(EXTRA_TESTS)
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/expected/join_hash_lz4.out b/src/test/regress/expected/join_hash_lz4.out
new file mode 100644
index 0000000000..966a5cd8f5
--- /dev/null
+++ b/src/test/regress/expected/join_hash_lz4.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out
index acdf7e436f..b31b32490d 100644
--- a/src/test/regress/expected/jsonb_jsonpath.out
+++ b/src/test/regress/expected/jsonb_jsonpath.out
@@ -2687,7 +2687,7 @@ select jsonb_path_query('"12:34:56 +5:30"', '$.time_tz().string()');
 select jsonb_path_query_tz('"12:34:56"', '$.time_tz().string()');
  jsonb_path_query_tz 
 ---------------------
- "12:34:56-07:00"
+ "12:34:56-08:00"
 (1 row)
 
 select jsonb_path_query('"12:34:56"', '$.time().string()');
diff --git a/src/test/regress/sql/join_hash_lz4.sql b/src/test/regress/sql/join_hash_lz4.sql
new file mode 100644
index 0000000000..1d19c1980e
--- /dev/null
+++ b/src/test/regress/sql/join_hash_lz4.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.39.5 (Apple Git-154)

0002-This-commit-enhance-temporary-file-compression.patchapplication/octet-stream; name=0002-This-commit-enhance-temporary-file-compression.patchDownload

From d9dd1bec90d554478273a13ac9a333b5cc9ba52c Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Sun, 1 Dec 2024 17:43:27 +0100
Subject: [PATCH 2/3] This commit enhance temporary file compression It
 implements just one working buffer for compression and decompression to avoid
 memory wasting. The buffer is allocated in the top memory context.

Also, it adds pglz support and enhances the code structure to be able
to add other compression method simply.
---
 src/backend/executor/nodeHashjoin.c |   2 +-
 src/backend/storage/file/buffile.c  | 105 ++++++++++++++++++++--------
 src/backend/utils/misc/guc_tables.c |   1 +
 src/include/storage/buffile.h       |   2 +
 4 files changed, 78 insertions(+), 32 deletions(-)

diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 1b5c6448ef..32cdb63ff8 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false, true);
+		file = BufFileCreateCompressTemp(false);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 818ef39d5c..6c8ebe56ca 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -53,7 +53,9 @@
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "utils/resowner.h"
+#include "utils/memutils.h"
 
+#include "common/pg_lzcompress.h"
 #ifdef USE_LZ4
 #include <lz4.h>
 #endif
@@ -108,6 +110,7 @@ struct BufFile
 	int			pos;			/* next read/write position in buffer */
 	int			nbytes;			/* total # of valid bytes in buffer */
 	bool			compress; /* State of usege file compression */
+    char        *cBuffer;
 	/*
 	 * XXX Should ideally us PGIOAlignedBlock, but might need a way to avoid
 	 * wasting per-file alignment padding when some users create many files.
@@ -140,6 +143,7 @@ makeBufFileCommon(int nfiles)
 	file->pos = 0;
 	file->nbytes = 0;
 	file->compress = false;
+    file->cBuffer = NULL;
 
 	return file;
 }
@@ -235,16 +239,45 @@ BufFileCreateTemp(bool interXact, bool compress)
 
 	if (temp_file_compression != TEMP_NONE_COMPRESSION)
 	{
-#ifdef USE_LZ4
 		file->compress = compress;
-#else
-		NO_LZ4_SUPPORT();
-#endif
 	}
 
 	return file;
+
 }
+/*
+ * Wrapper for BuffileCreateTemp
+ * We want to limit the number of memory allocations for the compression buffer,
+ * only one buffer for all compression operations is enough
+ */
+BufFile *
+BufFileCreateCompressTemp(bool interXact){
+    static char * buff = NULL;
+    BufFile *tmpBufFile = BufFileCreateTemp(interXact, true);
+
+    if (buff == NULL && temp_file_compression != TEMP_NONE_COMPRESSION)
+    {
+        int size = 0;
 
+        switch (temp_file_compression)
+        {
+            case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+                size = LZ4_compressBound(BLCKSZ)+sizeof(int);
+#endif
+                break;
+            case TEMP_PGLZ_COMPRESSION:
+                size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ)+sizeof(int);
+                break;
+        }
+        /*
+         * Persistent buffer for all temporary file compressions
+         */
+        buff = MemoryContextAlloc(TopMemoryContext, size);
+    }
+    tmpBufFile->cBuffer = buff;
+    return tmpBufFile;
+}
 /*
  * Build the name for a given segment of a given BufFile.
  */
@@ -516,12 +549,10 @@ BufFileLoadBuffer(BufFile *file)
 		/* if not EOF let's continue */
 		if (nread > 0)
 		{
-			/*
-			 * A long life buffer would make sence to limit number of
-			 * memory allocations
-			 */
-			char * buff;
+			/* A long life buffer limits number of memory allocations */
+			char * buff = file->cBuffer;
 
+            Assert(file->cBuffer != NULL);
 			/*
 			 * Read compressed data, curOffset differs with pos
 			 * It reads less data than it returns to caller
@@ -529,25 +560,32 @@ BufFileLoadBuffer(BufFile *file)
 			 */
 			file->curOffset+=sizeof(nbytes);
 
-			buff = palloc(nbytes);
-
 			nread = FileRead(thisfile,
 							buff,
 							nbytes,
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
 
+            switch (temp_file_compression)
+            {
+                case TEMP_LZ4_COMPRESSION:
 #ifdef USE_LZ4
-			file->nbytes = LZ4_decompress_safe(buff,
-				file->buffer.data,nbytes,sizeof(file->buffer));
-			file->curOffset += nread;
+			        file->nbytes = LZ4_decompress_safe(buff,
+				        file->buffer.data,nbytes,sizeof(file->buffer));
 #endif
+                    break;
+
+                case TEMP_PGLZ_COMPRESSION:
+                    file->nbytes = pglz_decompress(buff,nbytes,
+                            file->buffer.data,sizeof(file->buffer),false);
+                    break;
+            }
+			file->curOffset += nread;
 
 			if (file->nbytes < 0)
 				ereport(ERROR,
 						(errcode(ERRCODE_DATA_CORRUPTED),
 						 errmsg_internal("compressed lz4 data is corrupt")));
-			pfree(buff);
 		}
 
 	}
@@ -607,23 +645,30 @@ BufFileDumpBuffer(BufFile *file)
 		int cBufferSize = 0;
 		char * cData;
 		int cSize = 0;
-#ifdef USE_LZ4
-		cBufferSize = LZ4_compressBound(file->nbytes);
-#endif
-		/*
-		 * A long life buffer would make sence to limit number of
-		 * memory allocations
-		 */
 		compression = true;
-		cData = palloc(cBufferSize + sizeof(int));
+
+        Assert(file->cBuffer != NULL);
+		cData = file->cBuffer;
+
+        switch (temp_file_compression)
+            {
+            case TEMP_LZ4_COMPRESSION:
 #ifdef USE_LZ4
-		/*
-		 * Using stream compression would lead to the slight improvement in
-		 * compression ratio
-		 */
-		cSize = LZ4_compress_default(file->buffer.data,
-				cData + sizeof(int),file->nbytes, cBufferSize);
+                cBufferSize = LZ4_compressBound(file->nbytes);
+                /*
+                 * Using stream compression would lead to the slight improvement in
+                 * compression ratio
+                 */
+                cSize = LZ4_compress_default(file->buffer.data,
+                        cData + sizeof(int),file->nbytes, cBufferSize);
 #endif
+                break;
+            case TEMP_PGLZ_COMPRESSION:
+                cSize = pglz_compress(file->buffer.data,file->nbytes,
+                        cData + sizeof(int),PGLZ_strategy_always);
+                break;
+            }
+
 
 		/* Write size of compressed block in front of compressed data
 		 * It's used to determine amount of data to read within
@@ -727,8 +772,6 @@ BufFileDumpBuffer(BufFile *file)
 	file->pos = 0;
 	file->nbytes = 0;
 
-	if (compression)
-		pfree(DataToWrite);
 }
 
 /*
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 3821caf763..e4a98d1198 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -461,6 +461,7 @@ static const struct config_enum_entry default_toast_compression_options[] = {
  */
 static const struct config_enum_entry temp_file_compression_options[] = {
 	{"no", TEMP_NONE_COMPRESSION, false},
+	{"pglz", TEMP_PGLZ_COMPRESSION, false},
 #ifdef  USE_LZ4
 	{"lz4", TEMP_LZ4_COMPRESSION, false},
 #endif
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 486b552e31..7b6a1e1798 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -35,6 +35,7 @@ typedef struct BufFile BufFile;
 typedef enum
 {
 	TEMP_NONE_COMPRESSION,
+	TEMP_PGLZ_COMPRESSION,
 #ifdef USE_LZ4
 	TEMP_LZ4_COMPRESSION
 #endif
@@ -47,6 +48,7 @@ extern PGDLLIMPORT int temp_file_compression;
  * prototypes for functions in buffile.c
  */
 
+extern BufFile *BufFileCreateCompressTemp(bool interXact);
 extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
 extern void BufFileClose(BufFile *file);
 extern pg_nodiscard size_t BufFileRead(BufFile *file, void *ptr, size_t size);
-- 
2.39.5 (Apple Git-154)

0003-Add-test-for-pglz-compression-of-temporary-files.patchapplication/octet-stream; name=0003-Add-test-for-pglz-compression-of-temporary-files.patchDownload

From bc6f0393ad85ca9fdd98bfaf36970e26cc28bade Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 26 Dec 2024 18:39:07 +0100
Subject: [PATCH 3/3] Add test for pglz compression of temporary files

---
 src/test/regress/expected/join_hash_pglz.out | 1166 ++++++++++++++++++
 src/test/regress/parallel_schedule           |    4 +-
 src/test/regress/sql/join_hash_pglz.sql      |  626 ++++++++++
 3 files changed, 1795 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/join_hash_pglz.out
 create mode 100644 src/test/regress/sql/join_hash_pglz.sql

diff --git a/src/test/regress/expected/join_hash_pglz.out b/src/test/regress/expected/join_hash_pglz.out
new file mode 100644
index 0000000000..99c67f982a
--- /dev/null
+++ b/src/test/regress/expected/join_hash_pglz.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4f38104ba0..6e1ed70e87 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -15,7 +15,6 @@ test: test_setup
 # The first group of parallel tests
 # ----------
 test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
-
 # ----------
 # The second group of parallel tests
 # multirangetypes depends on rangetypes
@@ -136,3 +135,6 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# this test is equivalent to join_hash test just the compression is enabled
+test: join_hash_pglz
diff --git a/src/test/regress/sql/join_hash_pglz.sql b/src/test/regress/sql/join_hash_pglz.sql
new file mode 100644
index 0000000000..2686afab27
--- /dev/null
+++ b/src/test/regress/sql/join_hash_pglz.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.39.5 (Apple Git-154)

Filip Janus

fjanus@redhat.com

about 1 year ago

In reply to: Filip Janus (#6)

3 attachment(s)

Re: Proposal: Adding compression of temporary files

I apologize for multiple messages, but I found a small bug in the previous
version.

-Filip-

so 4. 1. 2025 v 23:40 odesílatel Filip Janus <fjanus@redhat.com> napsal:

Show quoted text

Even though i started with lz4, I added also pglz support and enhanced
memory management based on provided review.

-Filip-

čt 28. 11. 2024 v 12:32 odesílatel Filip Janus <fjanus@redhat.com> napsal:
I've added a regression test for lz4 compression if the server is
compiled with the "--with-lz4" option.

-Filip-

ne 24. 11. 2024 v 15:53 odesílatel Filip Janus <fjanus@redhat.com>
napsal:
-Filip-

st 20. 11. 2024 v 1:35 odesílatel Tomas Vondra <tomas@vondra.me> napsal:

Hi,

On 11/18/24 22:58, Filip Janus wrote:

...
Hi all,
Postgresql supports data compression nowadays, but the

compression of

temporary files has not been implemented yet. The huge queries

can

produce a significant amount of temporary data that needs to
be stored on disk
and cause many expensive I/O operations.
I am attaching a proposal of the patch to enable temporary files
compression for
hashjoins for now. Initially, I've chosen the LZ4 compression
algorithm. It would
probably make better sense to start with pglz, but I realized it

late.

Thanks for the idea & patch. I agree this might be quite useful for
workloads generating a lot of temporary files for stuff like sorts etc.
I think it will be interesting to think about the trade offs, i.e. how
to pick the compression level - at some point the compression ratio
stops improving while paying more and more CPU time. Not sure what the
right choice is, so using default seems fine.

I agree it'd be better to start with pglz, and only then add lz4 etc.
Firstly, pglz is simply the built-in compression, supported everywhere.
And it's also simpler to implement, I think.

# Future possible improvements
Reducing the number of memory allocations within the dumping and
loading of
the buffer. I have two ideas for solving this problem. I would
either add a buffer into
struct BufFile or provide the buffer as an argument from the

caller.

For the sequential
execution, I would prefer the second option.

Yes, this would be good. Doing a palloc+pfree for each compression is
going to be expensive, especially because these buffers are going to be
large - likely larger than 8kB. Which means it's not cached in the
memory context, etc.

Adding it to the BufFile is not going to fly, because that doubles the
amount of memory per file. And we already have major issues with hash
joins consuming massive amounts of memory. But at the same time the
buffer is only needed during compression, and there's only one at a
time. So I agree with passing a single buffer as an argument.

# Future plan/open questions
In the future, I would like to add support for pglz and zstd.
Further, I plan to
extend the support of the temporary file compression also for
sorting, gist index creation, etc.

Experimenting with the stream mode of compression algorithms. The
compression
ratio of LZ4 in block mode seems to be satisfying, but the stream
mode could
produce a better ratio, but it would consume more memory due to

the

requirement to store
context for LZ4 stream compression.

One thing I realized is that this only enables temp file compression for
a single place - hash join spill files. AFAIK this is because compressed
files don't support random access, and the other places might need that.

Is that correct? The patch does not explain this anywhere. If that's
correct, the patch probably should mention this in a comment for the
'compress' argument added to BufFileCreateTemp(), so that it's clear
when it's legal to set compress=true.

I will add the description there.

Which other places might compress temp files? Surely hash joins are not
the only place that could benefit from this, right?

Yes, you are definitely right. I have chosen the hash joins as a POC
because
there are no seeks besides seeks at the beginning of the buffer.
I have focused on hashjoins, but there are definitely also other places
where
the compression could be used. I want to add support in other places
in the feature.

Another thing is testing. If I run regression tests, it won't use
compression at all, because the GUC has "none" by default, right? But we
need some testing, so how would we do that? One option would be to add a
regression test that explicitly sets the GUC and does a hash join, but
that won't work with lz4 (because that may not be enabled).

Right, it's "none" by default. My opinion is that we would like to test
every supported compression method, so I will try to add environment
variable as
you recommended.
Another option might be to add a PG_TEST_xxx environment variable that
determines compression to use. Something like PG_TEST_USE_UNIX_SOCKETS.
But perhaps there's a simpler way.

# Benchmark
I prepared three different databases to check expectations. Each
dataset is described below. My testing demonstrates that my patch
improves the execution time of huge hash joins.
Also, my implementation should not
negatively affect performance within smaller queries.
The usage of memory needed for temporary files was reduced in

every

execution without a significant impact on execution time.

*## Dataset A:*
Tables*
*
table_a(bigint id,text data_text,integer data_number) - 10000000

rows

table_b(bigint id, integer ref_id, numeric data_value, bytea
data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
a.id> = b.id <http://b.id>;

The tables contain highly compressible data.
The query demonstrated a reduction in the usage of the temporary
files ~20GB -> 3GB, based on this reduction also caused the

execution

time of the query to be reduced by about ~10s.

*## Dataset B:*
Tables:*
*
table_a(integer id, text data_blob) - 1110000 rows
table_b(integer id, text data_blob) - 10000000 rows
Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
a.id> = b.id <http://b.id>;

The tables contain less compressible data. data_blob was generated
by a pseudo-random generator.
In this case, the data reduction was only ~50%. Also, the

execution

time was reduced
only slightly with the enabled compression.

The second scenario demonstrates no overhead in the case of

enabled

compression and extended work_mem to avoid temp file usage.

*## Dataset C:*
Tables
customers (integer,text,text,text,text)
order_items(integer,integer,integer,integer,numeric(10,2))
orders(integer,integer,timestamp,numeric(10,2))
products(integer,text,text,numeric(10,2),integer)

Query: SELECT p.product_id, p.name <http://p.name>, p.price,
SUM(oi.quantity) AS total_quantity, AVG(oi.price) AS

avg_item_price

FROM eshop.products p JOIN eshop.order_items oi ON p.product_id =
oi.product_id JOIN
eshop.orders o ON oi.order_id = o.order_id WHERE o.order_date >
'2020-01-01' AND p.price > 50
GROUP BY p.product_id, p.name <http://p.name>, p.price HAVING
SUM(oi.quantity) > 1000
ORDER BY total_quantity DESC LIMIT 100;

This scenario should demonstrate a more realistic usage of the

database.

Enabled compression slightly reduced the temporary memory usage,

but
the execution
time wasn't affected by compression.
+------------+-------------------------+-----------------------
+------------------------------+
|  Dataset   | Compression.       | temp_bytes         | Execution
Time (ms)   |
+------------+-------------------------+-----------------------
+----------------------------- +
| A             | Yes                        |  3.09 GiB
| 22s586ms | work_mem = 4MB
| | No | 21.89 GiB
| 35s                       | work_mem  = 4MB
+------------+-------------------------+-----------------------
+----------------------------------------
| B             | Yes                        |  333 MB
| 1815.545 ms | work_mem = 4MB
| | No | 146 MB

| 1500.460 ms | work_mem = 4MB
| | Yes | 0 MB

| 3262.305 ms | work_mem = 80MB
| | No | 0 MB
| 3174.725 ms         | work_mem = 80MB
+-------------+------------------------+------------------------
+-------------------------------------
| C             | Yes                       | 40
MB
| 1011.020 ms        | work_mem = 1MB
|                | No                        |  53
MB                 |  1034.142 ms        | work_mem = 1MB
+------------+------------------------+------------------------
+--------------------------------------
Thanks. I'll try to do some benchmarks on my own.

Are these results fro ma single run, or an average of multiple runs?
It is average from multiple runs.

Do

you maybe have a script to reproduce this, including the data
generation?

I am attaching my SQL file for database preparation. I also did further
testing
with two other machines( see attachment huge_tables.rtf ).

Also, can you share some information about the machine used for this? I
expect the impact to strongly depends on memory pressure - if the temp
file fits into page cache (and stays there), it may not benefit from the
compression, right?

If it fits into the page cache due to compression, I would consider it
as a benefit from compression.
I performed further testing on machines with different memory sizes.
Both experiments showed that compression was beneficial for execution
time.
The execution time reduction was more significant in the case of the
machine that had
less memory available.

Tests were performed on:
MacBook PRO M3 36GB - MacOs
Virtual machine ARM64 10GB/ 6CPU - Fedora 39

regards

--
Tomas Vondra

Attachments:

0001-This-commit-adds-support-for-temporary-files-compres.patchapplication/octet-stream; name=0001-This-commit-adds-support-for-temporary-files-compres.patchDownload

From fd4e42c830bdf8231ac7b6ae21326c38baacdc34 Mon Sep 17 00:00:00 2001
From: Filip <fjanus@redhat.com>
Date: Thu, 24 Oct 2024 12:15:10 +0200
Subject: [PATCH 1/3] This commit adds support for temporary files compression,
 it can be used only for hashjoins now.

It also adds GUC parameter temp_file_compression that enables this functionality.
For now, it supports just lz4 algorithms. In the future, it
could also be implemented pglz and zstd support.
---
 src/Makefile.global.in                        |    1 +
 src/backend/access/gist/gistbuildbuffers.c    |    2 +-
 src/backend/backup/backup_manifest.c          |    2 +-
 src/backend/executor/nodeHashjoin.c           |    2 +-
 src/backend/storage/file/buffile.c            |  176 ++-
 src/backend/utils/misc/guc_tables.c           |   23 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/backend/utils/sort/logtape.c              |    2 +-
 src/backend/utils/sort/tuplestore.c           |    2 +-
 src/include/storage/buffile.h                 |   13 +-
 src/test/regress/GNUmakefile                  |    4 +
 src/test/regress/expected/join_hash_lz4.out   | 1166 +++++++++++++++++
 src/test/regress/expected/jsonb_jsonpath.out  |    2 +-
 src/test/regress/sql/join_hash_lz4.sql        |  626 +++++++++
 14 files changed, 2006 insertions(+), 16 deletions(-)
 create mode 100644 src/test/regress/expected/join_hash_lz4.out
 create mode 100644 src/test/regress/sql/join_hash_lz4.sql

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 42f50b4976..06e701fe9c 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -193,6 +193,7 @@ with_ldap	= @with_ldap@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
 with_llvm	= @with_llvm@
+with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4c2301da00..9b3b00142a 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index a2e2f86332..f8a3e1f0f4 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2f7170604d..1b5c6448ef 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateTemp(false, true);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a27f51f622..818ef39d5c 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -54,6 +54,16 @@
 #include "storage/fd.h"
 #include "utils/resowner.h"
 
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+#define NO_LZ4_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method lz4 not supported"), \
+			 errdetail("This functionality requires the server to be built with lz4 support.")))
+
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
  * The reason is that we'd like large BufFiles to be spread across multiple
@@ -62,6 +72,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -95,7 +107,7 @@ struct BufFile
 	off_t		curOffset;		/* offset part of current pos */
 	int			pos;			/* next read/write position in buffer */
 	int			nbytes;			/* total # of valid bytes in buffer */
-
+	bool			compress; /* State of usege file compression */
 	/*
 	 * XXX Should ideally us PGIOAlignedBlock, but might need a way to avoid
 	 * wasting per-file alignment padding when some users create many files.
@@ -127,6 +139,7 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress = false;
 
 	return file;
 }
@@ -188,9 +201,17 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
+ *
+ * If compress is true the temporary files will be compressed before
+ * writing on disk.
+ *
+ * Note: The compression does not support random access. Only the hash joins
+ * use it for now. The seek operation other than seek to the beginning of the
+ * buffile will corrupt temporary data offsets.
+ *
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,6 +233,15 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+#ifdef USE_LZ4
+		file->compress = compress;
+#else
+		NO_LZ4_SUPPORT();
+#endif
+	}
+
 	return file;
 }
 
@@ -275,6 +305,7 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
 	file->files[0] = MakeNewFileSetSegment(file, 0);
 	file->readOnly = false;
 
+
 	return file;
 }
 
@@ -455,13 +486,72 @@ BufFileLoadBuffer(BufFile *file)
 		INSTR_TIME_SET_ZERO(io_start);
 
 	/*
-	 * Read whatever we can get, up to a full bufferload.
+	 * Load data as it is stored in the temporary file
 	 */
-	file->nbytes = FileRead(thisfile,
+	if (!file->compress)
+	{
+
+		/*
+	 	* Read whatever we can get, up to a full bufferload.
+	 	*/
+		file->nbytes = FileRead(thisfile,
 							file->buffer.data,
 							sizeof(file->buffer),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
+	/*
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
+	 */
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/*
+			 * A long life buffer would make sence to limit number of
+			 * memory allocations
+			 */
+			char * buff;
+
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset+=sizeof(nbytes);
+
+			buff = palloc(nbytes);
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+
+#ifdef USE_LZ4
+			file->nbytes = LZ4_decompress_safe(buff,
+				file->buffer.data,nbytes,sizeof(file->buffer));
+			file->curOffset += nread;
+#endif
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+			pfree(buff);
+		}
+
+	}
+
 	if (file->nbytes < 0)
 	{
 		file->nbytes = 0;
@@ -494,9 +584,56 @@ static void
 BufFileDumpBuffer(BufFile *file)
 {
 	int			wpos = 0;
-	int			bytestowrite;
+	int			bytestowrite = 0;
 	File		thisfile;
 
+
+	/* Save nbytes value because the size changes due to compression */
+	int nbytesOriginal = file->nbytes;
+
+	bool compression = false;
+
+	char * DataToWrite;
+	DataToWrite = file->buffer.data;
+
+	/*
+	 * Prepare compressed data to write
+	 * size of compressed block needs to be added at the beggining of the
+	 * compressed data
+	 */
+
+
+	if (file->compress) {
+		int cBufferSize = 0;
+		char * cData;
+		int cSize = 0;
+#ifdef USE_LZ4
+		cBufferSize = LZ4_compressBound(file->nbytes);
+#endif
+		/*
+		 * A long life buffer would make sence to limit number of
+		 * memory allocations
+		 */
+		compression = true;
+		cData = palloc(cBufferSize + sizeof(int));
+#ifdef USE_LZ4
+		/*
+		 * Using stream compression would lead to the slight improvement in
+		 * compression ratio
+		 */
+		cSize = LZ4_compress_default(file->buffer.data,
+				cData + sizeof(int),file->nbytes, cBufferSize);
+#endif
+
+		/* Write size of compressed block in front of compressed data
+		 * It's used to determine amount of data to read within
+		 * decompression process
+		 */
+		memcpy(cData,&cSize,sizeof(int));
+		file->nbytes=cSize + sizeof(int);
+		DataToWrite = cData;
+	}
+
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
 	 * crosses a component-file boundary; so we need a loop.
@@ -535,7 +672,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +701,19 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+
+
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else
+		if (nbytesOriginal - file->pos != 0)
+			/* curOffset must be corrected also if compression is
+			 * enabled, nbytes was changed by compression but we
+			 * have to use the original value of nbytes
+			 */
+			file->curOffset-=bytestowrite;
+
+
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -577,6 +726,9 @@ BufFileDumpBuffer(BufFile *file)
 	 */
 	file->pos = 0;
 	file->nbytes = 0;
+
+	if (compression)
+		pfree(DataToWrite);
 }
 
 /*
@@ -602,8 +754,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 * */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 686309db58..3821caf763 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -73,6 +73,7 @@
 #include "replication/syncrep.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
+#include "storage/buffile.h"
 #include "storage/large_object.h"
 #include "storage/pg_shmem.h"
 #include "storage/predicate.h"
@@ -454,6 +455,17 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 #endif
 	{NULL, 0, false}
 };
+/*
+ * pglz and zstd support should be added as future enhancement
+ *
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
 
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
@@ -4856,6 +4868,17 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"temp_file_compression", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the default compression method for compressible values."),
+			NULL
+		},
+		&temp_file_compression,
+		TEMP_NONE_COMPRESSION,
+		temp_file_compression_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"default_transaction_isolation", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Sets the transaction isolation level of each new transaction."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 667e0dc40a..e9c0b36352 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -177,6 +177,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'	# enables temporary files compression
 
 # - Kernel Resources -
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 44b30e86ad..af43b3ebb1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index a720d70200..a952f0f4f5 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 5f6d7c8e3f..486b552e31 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,22 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+#ifdef USE_LZ4
+	TEMP_LZ4_COMPRESSION
+#endif
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
 extern void BufFileClose(BufFile *file);
 extern pg_nodiscard size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index 9003435aab..859eb79bd7 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -114,6 +114,10 @@ submake-contrib-spi: | submake-libpgport submake-generated-headers
 REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 \
 	$(EXTRA_REGRESS_OPTS)
 
+ifeq ($(with_lz4),yes)
+override EXTRA_TESTS := join_hash_lz4 $(EXTRA_TESTS)
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/expected/join_hash_lz4.out b/src/test/regress/expected/join_hash_lz4.out
new file mode 100644
index 0000000000..966a5cd8f5
--- /dev/null
+++ b/src/test/regress/expected/join_hash_lz4.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out
index acdf7e436f..b31b32490d 100644
--- a/src/test/regress/expected/jsonb_jsonpath.out
+++ b/src/test/regress/expected/jsonb_jsonpath.out
@@ -2687,7 +2687,7 @@ select jsonb_path_query('"12:34:56 +5:30"', '$.time_tz().string()');
 select jsonb_path_query_tz('"12:34:56"', '$.time_tz().string()');
  jsonb_path_query_tz 
 ---------------------
- "12:34:56-07:00"
+ "12:34:56-08:00"
 (1 row)
 
 select jsonb_path_query('"12:34:56"', '$.time().string()');
diff --git a/src/test/regress/sql/join_hash_lz4.sql b/src/test/regress/sql/join_hash_lz4.sql
new file mode 100644
index 0000000000..1d19c1980e
--- /dev/null
+++ b/src/test/regress/sql/join_hash_lz4.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.39.5 (Apple Git-154)

0002-This-commit-enhance-temporary-file-compression.patchapplication/octet-stream; name=0002-This-commit-enhance-temporary-file-compression.patchDownload

From 0087b33bb80b67ae4f7d4901b047da5e2c913d72 Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Sun, 1 Dec 2024 17:43:27 +0100
Subject: [PATCH 2/3] This commit enhance temporary file compression It
 implements just one working buffer for compression and decompression to avoid
 memory wasting. The buffer is allocated in the top memory context.

Also, it adds pglz support and enhances the code structure to be able
to add other compression method simply.
---
 src/backend/executor/nodeHashjoin.c |   2 +-
 src/backend/storage/file/buffile.c  | 111 +++++++++++++++++++---------
 src/backend/utils/misc/guc_tables.c |   1 +
 src/include/storage/buffile.h       |   4 +-
 4 files changed, 80 insertions(+), 38 deletions(-)

diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 1b5c6448ef..32cdb63ff8 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false, true);
+		file = BufFileCreateCompressTemp(false);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 818ef39d5c..2b270211cc 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -53,7 +53,9 @@
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "utils/resowner.h"
+#include "utils/memutils.h"
 
+#include "common/pg_lzcompress.h"
 #ifdef USE_LZ4
 #include <lz4.h>
 #endif
@@ -108,6 +110,7 @@ struct BufFile
 	int			pos;			/* next read/write position in buffer */
 	int			nbytes;			/* total # of valid bytes in buffer */
 	bool			compress; /* State of usege file compression */
+    char        *cBuffer;
 	/*
 	 * XXX Should ideally us PGIOAlignedBlock, but might need a way to avoid
 	 * wasting per-file alignment padding when some users create many files.
@@ -140,6 +143,7 @@ makeBufFileCommon(int nfiles)
 	file->pos = 0;
 	file->nbytes = 0;
 	file->compress = false;
+    file->cBuffer = NULL;
 
 	return file;
 }
@@ -235,16 +239,45 @@ BufFileCreateTemp(bool interXact, bool compress)
 
 	if (temp_file_compression != TEMP_NONE_COMPRESSION)
 	{
-#ifdef USE_LZ4
 		file->compress = compress;
-#else
-		NO_LZ4_SUPPORT();
-#endif
 	}
 
 	return file;
+
 }
+/*
+ * Wrapper for BuffileCreateTemp
+ * We want to limit the number of memory allocations for the compression buffer,
+ * only one buffer for all compression operations is enough
+ */
+BufFile *
+BufFileCreateCompressTemp(bool interXact){
+    static char * buff = NULL;
+    BufFile *tmpBufFile = BufFileCreateTemp(interXact, true);
+
+    if (buff == NULL && temp_file_compression != TEMP_NONE_COMPRESSION)
+    {
+        int size = 0;
 
+        switch (temp_file_compression)
+        {
+            case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+                size = LZ4_compressBound(BLCKSZ)+sizeof(int);
+#endif
+                break;
+            case TEMP_PGLZ_COMPRESSION:
+                size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ)+sizeof(int);
+                break;
+        }
+        /*
+         * Persistent buffer for all temporary file compressions
+         */
+        buff = MemoryContextAlloc(TopMemoryContext, size);
+    }
+    tmpBufFile->cBuffer = buff;
+    return tmpBufFile;
+}
 /*
  * Build the name for a given segment of a given BufFile.
  */
@@ -516,12 +549,10 @@ BufFileLoadBuffer(BufFile *file)
 		/* if not EOF let's continue */
 		if (nread > 0)
 		{
-			/*
-			 * A long life buffer would make sence to limit number of
-			 * memory allocations
-			 */
-			char * buff;
+			/* A long life buffer limits number of memory allocations */
+			char * buff = file->cBuffer;
 
+            Assert(file->cBuffer != NULL);
 			/*
 			 * Read compressed data, curOffset differs with pos
 			 * It reads less data than it returns to caller
@@ -529,25 +560,32 @@ BufFileLoadBuffer(BufFile *file)
 			 */
 			file->curOffset+=sizeof(nbytes);
 
-			buff = palloc(nbytes);
-
 			nread = FileRead(thisfile,
 							buff,
 							nbytes,
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
 
+            switch (temp_file_compression)
+            {
+                case TEMP_LZ4_COMPRESSION:
 #ifdef USE_LZ4
-			file->nbytes = LZ4_decompress_safe(buff,
-				file->buffer.data,nbytes,sizeof(file->buffer));
-			file->curOffset += nread;
+			        file->nbytes = LZ4_decompress_safe(buff,
+				        file->buffer.data,nbytes,sizeof(file->buffer));
 #endif
+                    break;
+
+                case TEMP_PGLZ_COMPRESSION:
+                    file->nbytes = pglz_decompress(buff,nbytes,
+                            file->buffer.data,sizeof(file->buffer),false);
+                    break;
+            }
+			file->curOffset += nread;
 
 			if (file->nbytes < 0)
 				ereport(ERROR,
 						(errcode(ERRCODE_DATA_CORRUPTED),
 						 errmsg_internal("compressed lz4 data is corrupt")));
-			pfree(buff);
 		}
 
 	}
@@ -591,8 +629,6 @@ BufFileDumpBuffer(BufFile *file)
 	/* Save nbytes value because the size changes due to compression */
 	int nbytesOriginal = file->nbytes;
 
-	bool compression = false;
-
 	char * DataToWrite;
 	DataToWrite = file->buffer.data;
 
@@ -604,26 +640,33 @@ BufFileDumpBuffer(BufFile *file)
 
 
 	if (file->compress) {
-		int cBufferSize = 0;
 		char * cData;
 		int cSize = 0;
+
+        Assert(file->cBuffer != NULL);
+		cData = file->cBuffer;
+
+        switch (temp_file_compression)
+            {
+            case TEMP_LZ4_COMPRESSION:
+                {
 #ifdef USE_LZ4
-		cBufferSize = LZ4_compressBound(file->nbytes);
-#endif
-		/*
-		 * A long life buffer would make sence to limit number of
-		 * memory allocations
-		 */
-		compression = true;
-		cData = palloc(cBufferSize + sizeof(int));
-#ifdef USE_LZ4
-		/*
-		 * Using stream compression would lead to the slight improvement in
-		 * compression ratio
-		 */
-		cSize = LZ4_compress_default(file->buffer.data,
-				cData + sizeof(int),file->nbytes, cBufferSize);
+                int cBufferSize = LZ4_compressBound(file->nbytes);
+                /*
+                 * Using stream compression would lead to the slight improvement in
+                 * compression ratio
+                 */
+                cSize = LZ4_compress_default(file->buffer.data,
+                        cData + sizeof(int),file->nbytes, cBufferSize);
 #endif
+                break;
+                }
+            case TEMP_PGLZ_COMPRESSION:
+                cSize = pglz_compress(file->buffer.data,file->nbytes,
+                        cData + sizeof(int),PGLZ_strategy_always);
+                break;
+            }
+
 
 		/* Write size of compressed block in front of compressed data
 		 * It's used to determine amount of data to read within
@@ -727,8 +770,6 @@ BufFileDumpBuffer(BufFile *file)
 	file->pos = 0;
 	file->nbytes = 0;
 
-	if (compression)
-		pfree(DataToWrite);
 }
 
 /*
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 3821caf763..e4a98d1198 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -461,6 +461,7 @@ static const struct config_enum_entry default_toast_compression_options[] = {
  */
 static const struct config_enum_entry temp_file_compression_options[] = {
 	{"no", TEMP_NONE_COMPRESSION, false},
+	{"pglz", TEMP_PGLZ_COMPRESSION, false},
 #ifdef  USE_LZ4
 	{"lz4", TEMP_LZ4_COMPRESSION, false},
 #endif
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 486b552e31..b8ce164e4b 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -35,9 +35,8 @@ typedef struct BufFile BufFile;
 typedef enum
 {
 	TEMP_NONE_COMPRESSION,
-#ifdef USE_LZ4
+	TEMP_PGLZ_COMPRESSION,
 	TEMP_LZ4_COMPRESSION
-#endif
 } TempCompression;
 
 extern PGDLLIMPORT int temp_file_compression;
@@ -47,6 +46,7 @@ extern PGDLLIMPORT int temp_file_compression;
  * prototypes for functions in buffile.c
  */
 
+extern BufFile *BufFileCreateCompressTemp(bool interXact);
 extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
 extern void BufFileClose(BufFile *file);
 extern pg_nodiscard size_t BufFileRead(BufFile *file, void *ptr, size_t size);
-- 
2.39.5 (Apple Git-154)

0003-Add-test-for-pglz-compression-of-temporary-files.patchapplication/octet-stream; name=0003-Add-test-for-pglz-compression-of-temporary-files.patchDownload

From 765e5eebb666972f5f3f67b7542a1ce34bbcd5cc Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 26 Dec 2024 18:39:07 +0100
Subject: [PATCH 3/3] Add test for pglz compression of temporary files

---
 src/test/regress/expected/join_hash_pglz.out | 1166 ++++++++++++++++++
 src/test/regress/parallel_schedule           |    4 +-
 src/test/regress/sql/join_hash_pglz.sql      |  626 ++++++++++
 3 files changed, 1795 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/join_hash_pglz.out
 create mode 100644 src/test/regress/sql/join_hash_pglz.sql

diff --git a/src/test/regress/expected/join_hash_pglz.out b/src/test/regress/expected/join_hash_pglz.out
new file mode 100644
index 0000000000..99c67f982a
--- /dev/null
+++ b/src/test/regress/expected/join_hash_pglz.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4f38104ba0..6e1ed70e87 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -15,7 +15,6 @@ test: test_setup
 # The first group of parallel tests
 # ----------
 test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
-
 # ----------
 # The second group of parallel tests
 # multirangetypes depends on rangetypes
@@ -136,3 +135,6 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# this test is equivalent to join_hash test just the compression is enabled
+test: join_hash_pglz
diff --git a/src/test/regress/sql/join_hash_pglz.sql b/src/test/regress/sql/join_hash_pglz.sql
new file mode 100644
index 0000000000..2686afab27
--- /dev/null
+++ b/src/test/regress/sql/join_hash_pglz.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.39.5 (Apple Git-154)

Alexander Korotkov

aekorotkov@gmail.com

10 months ago

In reply to: Filip Janus (#7)

Re: Proposal: Adding compression of temporary files

On Sun, Jan 5, 2025 at 1:43 AM Filip Janus <fjanus@redhat.com> wrote:

I apologize for multiple messages, but I found a small bug in the previous version.

-Filip-

Great, thank you for your work.

I think the patches could use a pgindent run.

I don't see a reason why the temp file compression method should be
different from the wal compression methods, which we already have
in-tree. Perhaps it would be nice to have a 0001 patch, which would
abstract the compression methods we now have for wal into a separate
file containing GUC option values and functions for
compress/decompress. Then, 0002 would apply this to temporary file
compression.

------
Regards,
Alexander Korotkov
Supabase

Tomas Vondra

tomas@vondra.me

10 months ago

In reply to: Alexander Korotkov (#8)

Re: Proposal: Adding compression of temporary files

On 3/15/25 11:40, Alexander Korotkov wrote:

On Sun, Jan 5, 2025 at 1:43 AM Filip Janus <fjanus@redhat.com> wrote:

I apologize for multiple messages, but I found a small bug in the previous version.

-Filip-

Great, thank you for your work.

I think the patches could use a pgindent run.

I don't see a reason why the temp file compression method should be
different from the wal compression methods, which we already have
in-tree. Perhaps it would be nice to have a 0001 patch, which would
abstract the compression methods we now have for wal into a separate
file containing GUC option values and functions for
compress/decompress. Then, 0002 would apply this to temporary file
compression.

Not sure I understand the design you're proposing ...

AFAIK the WAL compression is not compressing the file data directly,
it's compressing backup blocks one by one, which then get written to WAL
as one piece of a record. So it's dealing with individual blocks, not
files, and we already have API to compress blocks (well, it's pretty
much the APIs for each compression method).

You're proposing abstracting that into a separate file - what would be
in that file? How would you abstract this to make it also useful for
file compression?

I can imagine a function CompressBufffer(method, dst, src, ...) wrapping
the various compression methods, unifying the error handling, etc. I can
imagine that, but that API is also limiting - e.g. how would that work
with stream compression, which seems irrelevant for WAL, but might be
very useful for tempfile compression.

IIRC this is mostly why we didn't try to do such generic API for pg_dump
compression, there's a local pg_dump-specific abstraction.

FWIW looking at the patch, I still don't quite understand why it needs
to correct the offset like this:

+    if (!file->compress)
+        file->curOffset -= (file->nbytes - file->pos);
+    else
+        if (nbytesOriginal - file->pos != 0)
+            /* curOffset must be corrected also if compression is
+             * enabled, nbytes was changed by compression but we
+             * have to use the original value of nbytes
+             */
+            file->curOffset-=bytestowrite;

It's not something introduced by the compression patch - the first part
is what we used to do before. But I find it a bit confusing - isn't it
mixing the correction of "logical file position" adjustment we did
before, and also the adjustment possibly needed due to compression?

In fact, isn't it going to fail if the code gets multiple loops in

while (wpos < file->nbytes)
{
...
}

because bytestowrite will be the value from the last loop? I haven't
tried, but I guess writing wide tuples (more than 8k) might fail.

regards

--
Tomas Vondra

#10

Alexander Korotkov

aekorotkov@gmail.com

10 months ago

In reply to: Tomas Vondra (#9)

Re: Proposal: Adding compression of temporary files

On Tue, Mar 18, 2025 at 12:13 AM Tomas Vondra <tomas@vondra.me> wrote:

On 3/15/25 11:40, Alexander Korotkov wrote:

On Sun, Jan 5, 2025 at 1:43 AM Filip Janus <fjanus@redhat.com> wrote:

I apologize for multiple messages, but I found a small bug in the previous version.

-Filip-

Great, thank you for your work.

I think the patches could use a pgindent run.

I don't see a reason why the temp file compression method should be
different from the wal compression methods, which we already have
in-tree. Perhaps it would be nice to have a 0001 patch, which would
abstract the compression methods we now have for wal into a separate
file containing GUC option values and functions for
compress/decompress. Then, 0002 would apply this to temporary file
compression.

Not sure I understand the design you're proposing ...

AFAIK the WAL compression is not compressing the file data directly,
it's compressing backup blocks one by one, which then get written to WAL
as one piece of a record. So it's dealing with individual blocks, not
files, and we already have API to compress blocks (well, it's pretty
much the APIs for each compression method).

You're proposing abstracting that into a separate file - what would be
in that file? How would you abstract this to make it also useful for
file compression?

I can imagine a function CompressBufffer(method, dst, src, ...) wrapping
the various compression methods, unifying the error handling, etc. I can
imagine that, but that API is also limiting - e.g. how would that work
with stream compression, which seems irrelevant for WAL, but might be
very useful for tempfile compression.

Yes, I was thinking about some generic API that provides a safe way to
compress some data chunk with given compression method. It seems that
yet it should suit both WAL, toast and temp files in the current
implementation. But, yes, if we would implement streaming compression
in future that would require another API.

------
Regards,
Alexander Korotkov
Supabase

#11

Filip Janus

fjanus@redhat.com

10 months ago

In reply to: Tomas Vondra (#9)

Re: Proposal: Adding compression of temporary files

-Filip-

po 17. 3. 2025 v 23:13 odesílatel Tomas Vondra <tomas@vondra.me> napsal:

On 3/15/25 11:40, Alexander Korotkov wrote:

On Sun, Jan 5, 2025 at 1:43 AM Filip Janus <fjanus@redhat.com> wrote:

I apologize for multiple messages, but I found a small bug in the

previous version.

-Filip-

Great, thank you for your work.

I think the patches could use a pgindent run.

I don't see a reason why the temp file compression method should be
different from the wal compression methods, which we already have
in-tree. Perhaps it would be nice to have a 0001 patch, which would
abstract the compression methods we now have for wal into a separate
file containing GUC option values and functions for
compress/decompress. Then, 0002 would apply this to temporary file
compression.

Not sure I understand the design you're proposing ...

AFAIK the WAL compression is not compressing the file data directly,
it's compressing backup blocks one by one, which then get written to WAL
as one piece of a record. So it's dealing with individual blocks, not
files, and we already have API to compress blocks (well, it's pretty
much the APIs for each compression method).

You're proposing abstracting that into a separate file - what would be
in that file? How would you abstract this to make it also useful for
file compression?

I can imagine a function CompressBufffer(method, dst, src, ...) wrapping
the various compression methods, unifying the error handling, etc. I can
imagine that, but that API is also limiting - e.g. how would that work
with stream compression, which seems irrelevant for WAL, but might be
very useful for tempfile compression.

IIRC this is mostly why we didn't try to do such generic API for pg_dump
compression, there's a local pg_dump-specific abstraction.

FWIW looking at the patch, I still don't quite understand why it needs
to correct the offset like this:
+    if (!file->compress)
+        file->curOffset -= (file->nbytes - file->pos);

This line of code is really confusing to me, and I wasn't able to fully
understand why it must be done,
but I experimented with it, and if I remember correctly, it's triggered
(the result differs from 0) mainly in the last call of
BufFileDumpBuffer function for a single data chunk.

+    else
+        if (nbytesOriginal - file->pos != 0)
+            /* curOffset must be corrected also if compression is
+             * enabled, nbytes was changed by compression but we
+             * have to use the original value of nbytes
+             */
+            file->curOffset-=bytestowrite;
It's not something introduced by the compression patch - the first part
is what we used to do before. But I find it a bit confusing - isn't it
mixing the correction of "logical file position" adjustment we did
before, and also the adjustment possibly needed due to compression?

In fact, isn't it going to fail if the code gets multiple loops in

while (wpos < file->nbytes)
{
...
}

because bytestowrite will be the value from the last loop? I haven't
tried, but I guess writing wide tuples (more than 8k) might fail.

I will definitely test it with larger tuples than 8K.

Maybe I don't understand it correctly,
the adjustment is performed in the case that file->nbytes and file->pos
differ.
So it must persist also if we are working with the compressed data, but the
problem is that data stored and compressed on disk has different sizes than
data incoming uncompressed ones, so what should be the correction value.
By debugging, I realized that the correction should correspond to the size
of
bytestowrite from the last iteration of the loop.

Show quoted text

regards

--
Tomas Vondra

#12

Dmitry Dolgov

9erthalion6@gmail.com

9 months ago

In reply to: Filip Janus (#11)

Re: Proposal: Adding compression of temporary files

On Fri, Mar 28, 2025 at 09:23:13AM GMT, Filip Janus wrote:
+    else
+        if (nbytesOriginal - file->pos != 0)
+            /* curOffset must be corrected also if compression is
+             * enabled, nbytes was changed by compression but we
+             * have to use the original value of nbytes
+             */
+            file->curOffset-=bytestowrite;
It's not something introduced by the compression patch - the first part
is what we used to do before. But I find it a bit confusing - isn't it
mixing the correction of "logical file position" adjustment we did
before, and also the adjustment possibly needed due to compression?

In fact, isn't it going to fail if the code gets multiple loops in

while (wpos < file->nbytes)
{
...
}

because bytestowrite will be the value from the last loop? I haven't
tried, but I guess writing wide tuples (more than 8k) might fail.
I will definitely test it with larger tuples than 8K.

Maybe I don't understand it correctly,
the adjustment is performed in the case that file->nbytes and file->pos
differ.
So it must persist also if we are working with the compressed data, but the
problem is that data stored and compressed on disk has different sizes than
data incoming uncompressed ones, so what should be the correction value.
By debugging, I realized that the correction should correspond to the size
of
bytestowrite from the last iteration of the loop.

I agree, this looks strange. If the idea is to set curOffset to its
original value + pos, and the original value was advanced multiple times
by bytestowrite, it seems incorrect to adjust it by bytestowrite, it
seems incorrect to adjust it only once. From what I see current tests do
not exercise a case where the while will get multiple loops, so it looks
fine.

At the same time maybe I'm missing something, but how exactly such test
for 8k tuples and multiple loops in the while block should look like?
E.g. when I force a hash join on a table with a single wide text column,
the minimal tuple that is getting written to the temporary file still
has rather small length, I assume due to toasting. Is there some other
way to achieve that?

#13

Filip Janus

fjanus@redhat.com

9 months ago

In reply to: Dmitry Dolgov (#12)

2 attachment(s)

Re: Proposal: Adding compression of temporary files

Since the patch was prepared months ago, it needs to be rebased.

-Filip-

ne 13. 4. 2025 v 21:53 odesílatel Dmitry Dolgov <9erthalion6@gmail.com>
napsal:

Show quoted text

On Fri, Mar 28, 2025 at 09:23:13AM GMT, Filip Janus wrote:
+    else
+        if (nbytesOriginal - file->pos != 0)
+            /* curOffset must be corrected also if compression is
+             * enabled, nbytes was changed by compression but we
+             * have to use the original value of nbytes
+             */
+            file->curOffset-=bytestowrite;
It's not something introduced by the compression patch - the first part
is what we used to do before. But I find it a bit confusing - isn't it
mixing the correction of "logical file position" adjustment we did
before, and also the adjustment possibly needed due to compression?

In fact, isn't it going to fail if the code gets multiple loops in

while (wpos < file->nbytes)
{
...
}

because bytestowrite will be the value from the last loop? I haven't
tried, but I guess writing wide tuples (more than 8k) might fail.
I will definitely test it with larger tuples than 8K.

Maybe I don't understand it correctly,
the adjustment is performed in the case that file->nbytes and file->pos
differ.
So it must persist also if we are working with the compressed data, but
the

problem is that data stored and compressed on disk has different sizes

than

data incoming uncompressed ones, so what should be the correction value.
By debugging, I realized that the correction should correspond to the

size

of
bytestowrite from the last iteration of the loop.

I agree, this looks strange. If the idea is to set curOffset to its
original value + pos, and the original value was advanced multiple times
by bytestowrite, it seems incorrect to adjust it by bytestowrite, it
seems incorrect to adjust it only once. From what I see current tests do
not exercise a case where the while will get multiple loops, so it looks
fine.

At the same time maybe I'm missing something, but how exactly such test
for 8k tuples and multiple loops in the while block should look like?
E.g. when I force a hash join on a table with a single wide text column,
the minimal tuple that is getting written to the temporary file still
has rather small length, I assume due to toasting. Is there some other
way to achieve that?

Attachments:

0002-Add-test-for-temporary-files-compression-this-commit.patchapplication/octet-stream; name=0002-Add-test-for-temporary-files-compression-this-commit.patchDownload

From 4196f056c6d5e8eeac23a064475d52414b45ca23 Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Wed, 16 Apr 2025 12:08:28 +0200
Subject: [PATCH 2/2] Add test for temporary files compression, this commit
 adds tests for lz4 and pglz.

---
 src/test/regress/GNUmakefile            |   4 +
 src/test/regress/parallel_schedule      |   4 +-
 src/test/regress/sql/join_hash_lz4.sql  | 626 ++++++++++++++++++++++++
 src/test/regress/sql/join_hash_pglz.sql | 626 ++++++++++++++++++++++++
 4 files changed, 1259 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/sql/join_hash_lz4.sql
 create mode 100644 src/test/regress/sql/join_hash_pglz.sql

diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index ef2bddf42ca..00757a44ca6 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -94,6 +94,10 @@ installdirs-tests: installdirs
 REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 \
 	$(EXTRA_REGRESS_OPTS)
 
+ifeq ($(with_lz4),yes)
+override EXTRA_TESTS := join_hash_lz4 $(EXTRA_TESTS)
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 0f38caa0d24..7701e57fad3 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -15,7 +15,6 @@ test: test_setup
 # The first group of parallel tests
 # ----------
 test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
-
 # ----------
 # The second group of parallel tests
 # multirangetypes depends on rangetypes
@@ -136,3 +135,6 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# this test is equivalent to join_hash test just the compression is enabled
+test: join_hash_pglz
diff --git a/src/test/regress/sql/join_hash_lz4.sql b/src/test/regress/sql/join_hash_lz4.sql
new file mode 100644
index 00000000000..1d19c1980e1
--- /dev/null
+++ b/src/test/regress/sql/join_hash_lz4.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
diff --git a/src/test/regress/sql/join_hash_pglz.sql b/src/test/regress/sql/join_hash_pglz.sql
new file mode 100644
index 00000000000..2686afab272
--- /dev/null
+++ b/src/test/regress/sql/join_hash_pglz.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.39.5 (Apple Git-154)

0001-This-commit-adds-support-for-temporary-files-compres.patchapplication/octet-stream; name=0001-This-commit-adds-support-for-temporary-files-compres.patchDownload

From a6a465cf060e9e0b25ee3389bb57657be260f66b Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Wed, 16 Apr 2025 12:03:03 +0200
Subject: [PATCH 1/2] This commit adds support for temporary files compression,
 it can be used only for hashjoins now.

It also adds GUC parameter temp_file_compression that enables this functionality.
For now, it supports just lz4 and pglz algorithms. In the future, it
could also be implemented zstd support.

It implements just one working buffer for compression and decompression to avoid
memory wasting. The buffer is allocated in the top memory context.
---
 src/Makefile.global.in                        |   1 +
 src/backend/access/gist/gistbuildbuffers.c    |   2 +-
 src/backend/backup/backup_manifest.c          |   2 +-
 src/backend/executor/nodeHashjoin.c           |   2 +-
 src/backend/storage/file/buffile.c            | 219 +++++++++++++++++-
 src/backend/utils/misc/guc_tables.c           |  24 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/sort/logtape.c              |   2 +-
 src/backend/utils/sort/tuplestore.c           |   2 +-
 src/include/storage/buffile.h                 |  13 +-
 10 files changed, 252 insertions(+), 16 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 6722fbdf365..6ff67bda17c 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -201,6 +201,7 @@ with_liburing	= @with_liburing@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
 with_llvm	= @with_llvm@
+with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 0707254d18e..9cc371f47fe 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index 22e2be37c95..c9f7daa1497 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5661ad76830..384265ca74a 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateCompressTemp(false);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 366d70d38a1..10da6308004 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -53,6 +53,18 @@
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "utils/resowner.h"
+#include "utils/memutils.h"
+
+#include "common/pg_lzcompress.h"
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+#define NO_LZ4_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method lz4 not supported"), \
+			 errdetail("This functionality requires the server to be built with lz4 support.")))
 
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
@@ -62,6 +74,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -95,7 +109,8 @@ struct BufFile
 	off_t		curOffset;		/* offset part of current pos */
 	int			pos;			/* next read/write position in buffer */
 	int			nbytes;			/* total # of valid bytes in buffer */
-
+	bool			compress; /* State of usege file compression */
+   char        *cBuffer;
 	/*
 	 * XXX Should ideally use PGIOAlignedBlock, but might need a way to avoid
 	 * wasting per-file alignment padding when some users create many files.
@@ -127,6 +142,8 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress = false;
+   file->cBuffer = NULL;
 
 	return file;
 }
@@ -188,9 +205,17 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
+ *
+ * If compress is true the temporary files will be compressed before
+ * writing on disk.
+ *
+ * Note: The compression does not support random access. Only the hash joins
+ * use it for now. The seek operation other than seek to the beginning of the
+ * buffile will corrupt temporary data offsets.
+ *
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,9 +237,47 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		file->compress = compress;
+	}
+
 	return file;
-}
 
+}
+/*
+ * Wrapper for BuffileCreateTemp
+ * We want to limit the number of memory allocations for the compression buffer,
+ * only one buffer for all compression operations is enough
+ */
+BufFile *
+BufFileCreateCompressTemp(bool interXact){
+   static char * buff = NULL;
+   BufFile *tmpBufFile = BufFileCreateTemp(interXact, true);
+
+   if (buff == NULL && temp_file_compression != TEMP_NONE_COMPRESSION)
+   {
+		int size = 0;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+				size = LZ4_compressBound(BLCKSZ)+sizeof(int);
+#endif
+				break;
+			case TEMP_PGLZ_COMPRESSION:
+				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ)+sizeof(int);
+				break;
+		}
+		/*
+		 * Persistent buffer for all temporary file compressions
+		 */
+		buff = MemoryContextAlloc(TopMemoryContext, size);
+	}
+	tmpBufFile->cBuffer = buff;
+	return tmpBufFile;
+}
 /*
  * Build the name for a given segment of a given BufFile.
  */
@@ -275,6 +338,7 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
 	file->files[0] = MakeNewFileSetSegment(file, 0);
 	file->readOnly = false;
 
+
 	return file;
 }
 
@@ -457,11 +521,75 @@ BufFileLoadBuffer(BufFile *file)
 	/*
 	 * Read whatever we can get, up to a full bufferload.
 	 */
-	file->nbytes = FileRead(thisfile,
+	if (!file->compress)
+	{
+
+		/*
+		* Read whatever we can get, up to a full bufferload.
+		*/
+		file->nbytes = FileRead(thisfile,
 							file->buffer.data,
-							sizeof(file->buffer.data),
+							sizeof(file->buffer),
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+	/*
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
+	 */
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/* A long life buffer limits number of memory allocations */
+			char * buff = file->cBuffer;
+
+			Assert(file->cBuffer != NULL);
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset+=sizeof(nbytes);
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
+
+			switch (temp_file_compression)
+			{
+				case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+					file->nbytes = LZ4_decompress_safe(buff,
+						file->buffer.data,nbytes,sizeof(file->buffer));
+#endif
+					break;
+
+				case TEMP_PGLZ_COMPRESSION:
+					file->nbytes = pglz_decompress(buff,nbytes,
+						file->buffer.data,sizeof(file->buffer),false);
+					break;
+			}
+			file->curOffset += nread;
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+		}
+
+	}
+
 	if (file->nbytes < 0)
 	{
 		file->nbytes = 0;
@@ -494,9 +622,61 @@ static void
 BufFileDumpBuffer(BufFile *file)
 {
 	int			wpos = 0;
-	int			bytestowrite;
+	int			bytestowrite = 0;
 	File		thisfile;
 
+
+	/* Save nbytes value because the size changes due to compression */
+	int nbytesOriginal = file->nbytes;
+
+	char * DataToWrite;
+	DataToWrite = file->buffer.data;
+
+	/*
+	 * Prepare compressed data to write
+	 * size of compressed block needs to be added at the beggining of the
+	 * compressed data
+	 */
+
+
+	if (file->compress) {
+		char * cData;
+		int cSize = 0;
+
+		Assert(file->cBuffer != NULL);
+		cData = file->cBuffer;
+
+		switch (temp_file_compression)
+			{
+			case TEMP_LZ4_COMPRESSION:
+				{
+#ifdef USE_LZ4
+				int cBufferSize = LZ4_compressBound(file->nbytes);
+				/*
+				* Using stream compression would lead to the slight improvement in
+				* compression ratio
+				*/
+				cSize = LZ4_compress_default(file->buffer.data,
+					cData + sizeof(int),file->nbytes, cBufferSize);
+#endif
+				break;
+				}
+			case TEMP_PGLZ_COMPRESSION:
+				cSize = pglz_compress(file->buffer.data,file->nbytes,
+					cData + sizeof(int),PGLZ_strategy_always);
+				break;
+			}
+
+
+		/* Write size of compressed block in front of compressed data
+		 * It's used to determine amount of data to read within
+		 * decompression process
+		 */
+		memcpy(cData,&cSize,sizeof(int));
+		file->nbytes=cSize + sizeof(int);
+		DataToWrite = cData;
+	}
+
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
 	 * crosses a component-file boundary; so we need a loop.
@@ -535,7 +715,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +744,19 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+
+
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else
+		if (nbytesOriginal - file->pos != 0)
+			/* curOffset must be corrected also if compression is
+			 * enabled, nbytes was changed by compression but we
+			 * have to use the original value of nbytes
+			 */
+			file->curOffset-=bytestowrite;
+
+
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -577,6 +769,7 @@ BufFileDumpBuffer(BufFile *file)
 	 */
 	file->pos = 0;
 	file->nbytes = 0;
+
 }
 
 /*
@@ -602,8 +795,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 * */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 60b12446a1c..ae052640ac0 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -78,6 +78,7 @@
 #include "replication/syncrep.h"
 #include "storage/aio.h"
 #include "storage/bufmgr.h"
+#include "storage/buffile.h"
 #include "storage/bufpage.h"
 #include "storage/copydir.h"
 #include "storage/io_worker.h"
@@ -463,6 +464,18 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 #endif
 	{NULL, 0, false}
 };
+/*
+ * pglz and zstd support should be added as future enhancement
+ *
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+	{"pglz", TEMP_PGLZ_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
 
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
@@ -5058,6 +5071,17 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"temp_file_compression", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the default compression method for compressible values."),
+			NULL
+		},
+		&temp_file_compression,
+		TEMP_NONE_COMPRESSION,
+		temp_file_compression_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"default_transaction_isolation", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Sets the transaction isolation level of each new transaction."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 34826d01380..77961a45d65 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -182,6 +182,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'	# enables temporary files compression
 
 #file_copy_method = copy	# the default is the first option
 					# 	copy
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index e529ceb8260..d862e22ef18 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index c9aecab8d66..ef85924cd21 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index a2f4821f240..931a211038b 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,22 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+	TEMP_PGLZ_COMPRESSION,
+	TEMP_LZ4_COMPRESSION
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateCompressTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
 extern void BufFileClose(BufFile *file);
 pg_nodiscard extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
-- 
2.39.5 (Apple Git-154)

#14

Filip Janus

fjanus@redhat.com

9 months ago

In reply to: Filip Janus (#13)

2 attachment(s)

Re: Proposal: Adding compression of temporary files

The latest rebase.

-Filip-

út 22. 4. 2025 v 9:17 odesílatel Filip Janus <fjanus@redhat.com> napsal:

Show quoted text

Since the patch was prepared months ago, it needs to be rebased.

-Filip-

ne 13. 4. 2025 v 21:53 odesílatel Dmitry Dolgov <9erthalion6@gmail.com>
napsal:
On Fri, Mar 28, 2025 at 09:23:13AM GMT, Filip Janus wrote:
+    else
+        if (nbytesOriginal - file->pos != 0)
+            /* curOffset must be corrected also if compression is
+             * enabled, nbytes was changed by compression but we
+             * have to use the original value of nbytes
+             */
+            file->curOffset-=bytestowrite;
It's not something introduced by the compression patch - the first
part

is what we used to do before. But I find it a bit confusing - isn't it
mixing the correction of "logical file position" adjustment we did
before, and also the adjustment possibly needed due to compression?

In fact, isn't it going to fail if the code gets multiple loops in

while (wpos < file->nbytes)
{
...
}

because bytestowrite will be the value from the last loop? I haven't
tried, but I guess writing wide tuples (more than 8k) might fail.

I will definitely test it with larger tuples than 8K.

Maybe I don't understand it correctly,
the adjustment is performed in the case that file->nbytes and file->pos
differ.
So it must persist also if we are working with the compressed data, but

the

problem is that data stored and compressed on disk has different sizes

than

data incoming uncompressed ones, so what should be the correction value.
By debugging, I realized that the correction should correspond to the

size

of
bytestowrite from the last iteration of the loop.

I agree, this looks strange. If the idea is to set curOffset to its
original value + pos, and the original value was advanced multiple times
by bytestowrite, it seems incorrect to adjust it by bytestowrite, it
seems incorrect to adjust it only once. From what I see current tests do
not exercise a case where the while will get multiple loops, so it looks
fine.

At the same time maybe I'm missing something, but how exactly such test
for 8k tuples and multiple loops in the while block should look like?
E.g. when I force a hash join on a table with a single wide text column,
the minimal tuple that is getting written to the temporary file still
has rather small length, I assume due to toasting. Is there some other
way to achieve that?

Attachments:

0002-Add-test-for-temporary-files-compression-this-commit.patchapplication/octet-stream; name=0002-Add-test-for-temporary-files-compression-this-commit.patchDownload

From f5b83504ad924c6638ce4fbf2c26d7a058d76c45 Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Wed, 16 Apr 2025 12:08:28 +0200
Subject: [PATCH 2/2] Add test for temporary files compression, this commit
 adds tests for lz4 and pglz.

---
 src/test/regress/GNUmakefile                 |    4 +
 src/test/regress/expected/join_hash_lz4.out  | 1166 ++++++++++++++++++
 src/test/regress/expected/join_hash_pglz.out | 1166 ++++++++++++++++++
 src/test/regress/parallel_schedule           |    4 +-
 src/test/regress/sql/join_hash_lz4.sql       |  626 ++++++++++
 src/test/regress/sql/join_hash_pglz.sql      |  626 ++++++++++
 6 files changed, 3591 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/join_hash_lz4.out
 create mode 100644 src/test/regress/expected/join_hash_pglz.out
 create mode 100644 src/test/regress/sql/join_hash_lz4.sql
 create mode 100644 src/test/regress/sql/join_hash_pglz.sql

diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index ef2bddf42ca..00757a44ca6 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -94,6 +94,10 @@ installdirs-tests: installdirs
 REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 \
 	$(EXTRA_REGRESS_OPTS)
 
+ifeq ($(with_lz4),yes)
+override EXTRA_TESTS := join_hash_lz4 $(EXTRA_TESTS)
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/expected/join_hash_lz4.out b/src/test/regress/expected/join_hash_lz4.out
new file mode 100644
index 00000000000..966a5cd8f55
--- /dev/null
+++ b/src/test/regress/expected/join_hash_lz4.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/expected/join_hash_pglz.out b/src/test/regress/expected/join_hash_pglz.out
new file mode 100644
index 00000000000..99c67f982af
--- /dev/null
+++ b/src/test/regress/expected/join_hash_pglz.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 0f38caa0d24..7701e57fad3 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -15,7 +15,6 @@ test: test_setup
 # The first group of parallel tests
 # ----------
 test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
-
 # ----------
 # The second group of parallel tests
 # multirangetypes depends on rangetypes
@@ -136,3 +135,6 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# this test is equivalent to join_hash test just the compression is enabled
+test: join_hash_pglz
diff --git a/src/test/regress/sql/join_hash_lz4.sql b/src/test/regress/sql/join_hash_lz4.sql
new file mode 100644
index 00000000000..1d19c1980e1
--- /dev/null
+++ b/src/test/regress/sql/join_hash_lz4.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
diff --git a/src/test/regress/sql/join_hash_pglz.sql b/src/test/regress/sql/join_hash_pglz.sql
new file mode 100644
index 00000000000..2686afab272
--- /dev/null
+++ b/src/test/regress/sql/join_hash_pglz.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.39.5 (Apple Git-154)

0001-This-commit-adds-support-for-temporary-files-compres.patchapplication/octet-stream; name=0001-This-commit-adds-support-for-temporary-files-compres.patchDownload

From 70f85b902463af7fec83bc9721b73c3eb53465f1 Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Wed, 16 Apr 2025 12:03:03 +0200
Subject: [PATCH 1/2] This commit adds support for temporary files compression,
 it can be used only for hashjoins now.

It also adds GUC parameter temp_file_compression that enables this functionality.
For now, it supports just lz4 and pglz algorithms. In the future, it
could also be implemented zstd support.

It implements just one working buffer for compression and decompression to avoid
memory wasting. The buffer is allocated in the top memory context.
---
 src/Makefile.global.in                        |   1 +
 src/backend/access/gist/gistbuildbuffers.c    |   2 +-
 src/backend/backup/backup_manifest.c          |   2 +-
 src/backend/executor/nodeHashjoin.c           |   2 +-
 src/backend/storage/file/buffile.c            | 217 +++++++++++++++++-
 src/backend/utils/misc/guc_tables.c           |  24 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/sort/logtape.c              |   2 +-
 src/backend/utils/sort/tuplestore.c           |   2 +-
 src/include/storage/buffile.h                 |  13 +-
 10 files changed, 251 insertions(+), 15 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 6722fbdf365..6ff67bda17c 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -201,6 +201,7 @@ with_liburing	= @with_liburing@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
 with_llvm	= @with_llvm@
+with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 0707254d18e..9cc371f47fe 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index 22e2be37c95..c9f7daa1497 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5661ad76830..384265ca74a 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateCompressTemp(false);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 366d70d38a1..103f6550322 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -53,6 +53,18 @@
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "utils/resowner.h"
+#include "utils/memutils.h"
+
+#include "common/pg_lzcompress.h"
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+#define NO_LZ4_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method lz4 not supported"), \
+			 errdetail("This functionality requires the server to be built with lz4 support.")))
 
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
@@ -62,6 +74,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -95,7 +109,8 @@ struct BufFile
 	off_t		curOffset;		/* offset part of current pos */
 	int			pos;			/* next read/write position in buffer */
 	int			nbytes;			/* total # of valid bytes in buffer */
-
+	bool			compress; /* State of usege file compression */
+   char        *cBuffer;
 	/*
 	 * XXX Should ideally use PGIOAlignedBlock, but might need a way to avoid
 	 * wasting per-file alignment padding when some users create many files.
@@ -127,6 +142,8 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress = false;
+   file->cBuffer = NULL;
 
 	return file;
 }
@@ -188,9 +205,17 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
+ *
+ * If compress is true the temporary files will be compressed before
+ * writing on disk.
+ *
+ * Note: The compression does not support random access. Only the hash joins
+ * use it for now. The seek operation other than seek to the beginning of the
+ * buffile will corrupt temporary data offsets.
+ *
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,9 +237,47 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		file->compress = compress;
+	}
+
 	return file;
+
 }
+/*
+ * Wrapper for BuffileCreateTemp
+ * We want to limit the number of memory allocations for the compression buffer,
+ * only one buffer for all compression operations is enough
+ */
+BufFile *
+BufFileCreateCompressTemp(bool interXact){
+   static char * buff = NULL;
+   BufFile *tmpBufFile = BufFileCreateTemp(interXact, true);
 
+   if (buff == NULL && temp_file_compression != TEMP_NONE_COMPRESSION)
+   {
+		int size = 0;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+				size = LZ4_compressBound(BLCKSZ)+sizeof(int);
+#endif
+				break;
+			case TEMP_PGLZ_COMPRESSION:
+				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ)+sizeof(int);
+				break;
+		}
+		/*
+		 * Persistent buffer for all temporary file compressions
+		 */
+		buff = MemoryContextAlloc(TopMemoryContext, size);
+	}
+	tmpBufFile->cBuffer = buff;
+	return tmpBufFile;
+}
 /*
  * Build the name for a given segment of a given BufFile.
  */
@@ -275,6 +338,7 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
 	file->files[0] = MakeNewFileSetSegment(file, 0);
 	file->readOnly = false;
 
+
 	return file;
 }
 
@@ -457,11 +521,75 @@ BufFileLoadBuffer(BufFile *file)
 	/*
 	 * Read whatever we can get, up to a full bufferload.
 	 */
-	file->nbytes = FileRead(thisfile,
+	if (!file->compress)
+	{
+
+		/*
+		* Read whatever we can get, up to a full bufferload.
+		*/
+		file->nbytes = FileRead(thisfile,
 							file->buffer.data,
-							sizeof(file->buffer.data),
+							sizeof(file->buffer),
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+	/*
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
+	 */
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/* A long life buffer limits number of memory allocations */
+			char * buff = file->cBuffer;
+
+			Assert(file->cBuffer != NULL);
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset+=sizeof(nbytes);
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
+
+			switch (temp_file_compression)
+			{
+				case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+					file->nbytes = LZ4_decompress_safe(buff,
+						file->buffer.data,nbytes,sizeof(file->buffer));
+#endif
+					break;
+
+				case TEMP_PGLZ_COMPRESSION:
+					file->nbytes = pglz_decompress(buff,nbytes,
+						file->buffer.data,sizeof(file->buffer),false);
+					break;
+			}
+			file->curOffset += nread;
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+		}
+
+	}
+
 	if (file->nbytes < 0)
 	{
 		file->nbytes = 0;
@@ -494,9 +622,61 @@ static void
 BufFileDumpBuffer(BufFile *file)
 {
 	int			wpos = 0;
-	int			bytestowrite;
+	int			bytestowrite = 0;
 	File		thisfile;
 
+
+	/* Save nbytes value because the size changes due to compression */
+	int nbytesOriginal = file->nbytes;
+
+	char * DataToWrite;
+	DataToWrite = file->buffer.data;
+
+	/*
+	 * Prepare compressed data to write
+	 * size of compressed block needs to be added at the beggining of the
+	 * compressed data
+	 */
+
+
+	if (file->compress) {
+		char * cData;
+		int cSize = 0;
+
+		Assert(file->cBuffer != NULL);
+		cData = file->cBuffer;
+
+		switch (temp_file_compression)
+			{
+			case TEMP_LZ4_COMPRESSION:
+				{
+#ifdef USE_LZ4
+				int cBufferSize = LZ4_compressBound(file->nbytes);
+				/*
+				* Using stream compression would lead to the slight improvement in
+				* compression ratio
+				*/
+				cSize = LZ4_compress_default(file->buffer.data,
+					cData + sizeof(int),file->nbytes, cBufferSize);
+#endif
+				break;
+				}
+			case TEMP_PGLZ_COMPRESSION:
+				cSize = pglz_compress(file->buffer.data,file->nbytes,
+					cData + sizeof(int),PGLZ_strategy_always);
+				break;
+			}
+
+
+		/* Write size of compressed block in front of compressed data
+		 * It's used to determine amount of data to read within
+		 * decompression process
+		 */
+		memcpy(cData,&cSize,sizeof(int));
+		file->nbytes=cSize + sizeof(int);
+		DataToWrite = cData;
+	}
+
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
 	 * crosses a component-file boundary; so we need a loop.
@@ -535,7 +715,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +744,19 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+
+
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else
+		if (nbytesOriginal - file->pos != 0)
+			/* curOffset must be corrected also if compression is
+			 * enabled, nbytes was changed by compression but we
+			 * have to use the original value of nbytes
+			 */
+			file->curOffset-=bytestowrite;
+
+
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -577,6 +769,7 @@ BufFileDumpBuffer(BufFile *file)
 	 */
 	file->pos = 0;
 	file->nbytes = 0;
+
 }
 
 /*
@@ -602,8 +795,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 * */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 60b12446a1c..ae052640ac0 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -78,6 +78,7 @@
 #include "replication/syncrep.h"
 #include "storage/aio.h"
 #include "storage/bufmgr.h"
+#include "storage/buffile.h"
 #include "storage/bufpage.h"
 #include "storage/copydir.h"
 #include "storage/io_worker.h"
@@ -463,6 +464,18 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 #endif
 	{NULL, 0, false}
 };
+/*
+ * pglz and zstd support should be added as future enhancement
+ *
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+	{"pglz", TEMP_PGLZ_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
 
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
@@ -5058,6 +5071,17 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"temp_file_compression", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the default compression method for compressible values."),
+			NULL
+		},
+		&temp_file_compression,
+		TEMP_NONE_COMPRESSION,
+		temp_file_compression_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"default_transaction_isolation", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Sets the transaction isolation level of each new transaction."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 34826d01380..77961a45d65 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -182,6 +182,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'	# enables temporary files compression
 
 #file_copy_method = copy	# the default is the first option
 					# 	copy
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index e529ceb8260..d862e22ef18 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index c9aecab8d66..ef85924cd21 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index a2f4821f240..931a211038b 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,22 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+	TEMP_PGLZ_COMPRESSION,
+	TEMP_LZ4_COMPRESSION
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateCompressTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
 extern void BufFileClose(BufFile *file);
 pg_nodiscard extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
-- 
2.39.5 (Apple Git-154)

#15

Andres Freund

andres@anarazel.de

7 months ago

In reply to: Filip Janus (#14)

Re: Proposal: Adding compression of temporary files

Hi,

On 2025-04-25 23:54:00 +0200, Filip Janus wrote:

The latest rebase.

This often seems to fail during tests:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5382

E.g.
https://api.cirrus-ci.com/v1/artifact/task/4667337632120832/testrun/build-32/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress

=== dumping /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/regression.diffs ===
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out	2025-05-26 05:04:40.686524215 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out	2025-05-26 05:15:00.534907680 +0000
@@ -594,11 +594,8 @@
 select count(*) from join_foo
   left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
   on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
- count 
--------
-     3
-(1 row)
-
+ERROR:  could not read from temporary file: read only 8180 of 1572860 bytes
+CONTEXT:  parallel worker
 select final > 1 as multibatch
   from hash_join_batches(
 $$
@@ -606,11 +603,7 @@
     left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
     on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
 $$);
- multibatch 
-------------
- t
-(1 row)
-
+ERROR:  current transaction is aborted, commands ignored until end of transaction block
 rollback to settings;
 -- single-batch with rescan, parallel-oblivious
 savepoint settings;

Greetings,

Andres

#16

Filip Janus

fjanus@redhat.com

5 months ago

In reply to: Andres Freund (#15)

2 attachment(s)

Re: Proposal: Adding compression of temporary files

I rebased the proposal and fixed the problem causing those problems.

-Filip-

út 17. 6. 2025 v 16:49 odesílatel Andres Freund <andres@anarazel.de> napsal:

Show quoted text

Hi,

On 2025-04-25 23:54:00 +0200, Filip Janus wrote:

The latest rebase.

This often seems to fail during tests:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5382

E.g.

https://api.cirrus-ci.com/v1/artifact/task/4667337632120832/testrun/build-32/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress

=== dumping
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/regression.diffs
===
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out
2025-05-26 05:04:40.686524215 +0000
+++
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out
2025-05-26 05:15:00.534907680 +0000
@@ -594,11 +594,8 @@
select count(*) from join_foo
left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using
(id)) ss
on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
- count
--------
-     3
-(1 row)
-
+ERROR:  could not read from temporary file: read only 8180 of 1572860
bytes
+CONTEXT:  parallel worker
select final > 1 as multibatch
from hash_join_batches(
$$
@@ -606,11 +603,7 @@
left join (select b1.id, b1.t from join_bar b1 join join_bar b2
using (id)) ss
on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
$$);
- multibatch
-------------
- t
-(1 row)
-
+ERROR:  current transaction is aborted, commands ignored until end of
transaction block
rollback to settings;
-- single-batch with rescan, parallel-oblivious
savepoint settings;

Greetings,

Andres

Attachments:

0002-Add-regression-tests-for-temporary-file-compression.patchapplication/octet-stream; name=0002-Add-regression-tests-for-temporary-file-compression.patchDownload

From 9f8cc97e846043b3cb9acf8bc0016fe9defa8e22 Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 31 Jul 2025 14:02:45 +0200
Subject: [PATCH 2/2] Add regression tests for temporary file compression

This commit adds comprehensive regression tests for the transparent
temporary file compression feature.

Test coverage:
- join_hash_lz4.sql: Tests hash join operations with LZ4 compression
- join_hash_pglz.sql: Tests hash join operations with PGLZ compression
- Both tests verify compression works correctly for various hash join scenarios
- Expected output files for validation

Test integration:
- LZ4 tests are conditionally enabled when PostgreSQL is built with --with-lz4
- PGLZ tests are always enabled as PGLZ is built-in
- Tests added to parallel regression test schedule
- GNUmakefile updated to include conditional LZ4 test execution

The tests ensure that compression/decompression works transparently
without affecting query results, while providing coverage for both
supported compression algorithms.
---
 src/test/regress/GNUmakefile                 |    4 +
 src/test/regress/expected/join_hash_lz4.out  | 1166 ++++++++++++++++++
 src/test/regress/expected/join_hash_pglz.out | 1166 ++++++++++++++++++
 src/test/regress/parallel_schedule           |    4 +-
 src/test/regress/sql/join_hash_lz4.sql       |  626 ++++++++++
 src/test/regress/sql/join_hash_pglz.sql      |  626 ++++++++++
 6 files changed, 3591 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/join_hash_lz4.out
 create mode 100644 src/test/regress/expected/join_hash_pglz.out
 create mode 100644 src/test/regress/sql/join_hash_lz4.sql
 create mode 100644 src/test/regress/sql/join_hash_pglz.sql

diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index ef2bddf42ca..94df5649e34 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -94,6 +94,10 @@ installdirs-tests: installdirs
 REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 \
 	$(EXTRA_REGRESS_OPTS)
 
+ifeq ($(with_lz4),yes)
+override EXTRA_TESTS := $(EXTRA_TESTS) join_hash_lz4
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/expected/join_hash_lz4.out b/src/test/regress/expected/join_hash_lz4.out
new file mode 100644
index 00000000000..966a5cd8f55
--- /dev/null
+++ b/src/test/regress/expected/join_hash_lz4.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/expected/join_hash_pglz.out b/src/test/regress/expected/join_hash_pglz.out
new file mode 100644
index 00000000000..99c67f982af
--- /dev/null
+++ b/src/test/regress/expected/join_hash_pglz.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..d62d44814ef 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -15,7 +15,6 @@ test: test_setup
 # The first group of parallel tests
 # ----------
 test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
-
 # ----------
 # The second group of parallel tests
 # multirangetypes depends on rangetypes
@@ -140,3 +139,6 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# this test is equivalent to join_hash test just the compression is enabled
+test: join_hash_pglz
diff --git a/src/test/regress/sql/join_hash_lz4.sql b/src/test/regress/sql/join_hash_lz4.sql
new file mode 100644
index 00000000000..1d19c1980e1
--- /dev/null
+++ b/src/test/regress/sql/join_hash_lz4.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
diff --git a/src/test/regress/sql/join_hash_pglz.sql b/src/test/regress/sql/join_hash_pglz.sql
new file mode 100644
index 00000000000..2686afab272
--- /dev/null
+++ b/src/test/regress/sql/join_hash_pglz.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.39.5 (Apple Git-154)

0001-Add-transparent-compression-for-temporary-files.patchapplication/octet-stream; name=0001-Add-transparent-compression-for-temporary-files.patchDownload

From 1effe0d73a51129216d47b2d5c9283c84fc5adb3 Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 31 Jul 2025 14:02:16 +0200
Subject: [PATCH 1/2] Add transparent compression for temporary files

This commit implements transparent compression for temporary files in PostgreSQL,
specifically designed for hash join operations that spill to disk.

Features:
- Support for LZ4 and PGLZ compression algorithms
- GUC parameter 'temp_file_compression' to control compression
- Transparent compression/decompression in BufFile layer
- Shared compression buffer to minimize memory allocation
- Hash join integration using BufFileCreateCompressTemp()

The compression is applied automatically when temp_file_compression is enabled,
with no changes required to calling code. Only hash joins use compression
currently, with seeking limited to rewinding to start.

Configuration options:
- temp_file_compression = 'no' (default)
- temp_file_compression = 'pglz'
- temp_file_compression = 'lz4' (requires --with-lz4)
---
 src/Makefile.global.in                        |   1 +
 src/backend/access/gist/gistbuildbuffers.c    |   2 +-
 src/backend/backup/backup_manifest.c          |   2 +-
 src/backend/executor/nodeHashjoin.c           |   2 +-
 src/backend/storage/file/buffile.c            | 311 +++++++++++++++++-
 src/backend/utils/misc/guc_tables.c           |  24 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/sort/logtape.c              |   2 +-
 src/backend/utils/sort/tuplestore.c           |   2 +-
 src/include/storage/buffile.h                 |  12 +-
 10 files changed, 337 insertions(+), 22 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 8b1b357beaa..6e2ce2948ea 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -201,6 +201,7 @@ with_liburing	= @with_liburing@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
 with_llvm	= @with_llvm@
+with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 0707254d18e..9cc371f47fe 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index d05252f383c..35d088db0f3 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5661ad76830..384265ca74a 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateCompressTemp(false);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 366d70d38a1..e4f9ed37eb2 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -53,6 +53,17 @@
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "utils/resowner.h"
+#include "utils/memutils.h"
+#include "common/pg_lzcompress.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+/* Compression types */
+#define TEMP_NONE_COMPRESSION  0
+#define TEMP_PGLZ_COMPRESSION  1
+#define TEMP_LZ4_COMPRESSION   2
 
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
@@ -62,6 +73,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -101,6 +114,10 @@ struct BufFile
 	 * wasting per-file alignment padding when some users create many files.
 	 */
 	PGAlignedBlock buffer;
+
+	bool		compress_tempfile; /* transparent compression mode */
+	bool		compress; /* State of usage file compression */
+	char		*cBuffer; /* compression buffer */
 };
 
 static BufFile *makeBufFileCommon(int nfiles);
@@ -127,6 +144,9 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress_tempfile = false;
+	file->compress = false;
+	file->cBuffer = NULL;
 
 	return file;
 }
@@ -188,9 +208,16 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
+ *
+ * If compress is true the temporary files will be compressed before
+ * writing on disk.
+ *
+ * Note: The compression does not support random access. Only the hash joins
+ * use it for now. The seek operation other than seek to the beginning of the
+ * buffile will corrupt temporary data offsets.
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,9 +239,68 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		file->compress = compress;
+	}
+
 	return file;
 }
 
+/*
+ * Wrapper for BufFileCreateTemp
+ * We want to limit the number of memory allocations for the compression buffer,
+ * only one buffer for all compression operations is enough
+ */
+BufFile *
+BufFileCreateCompressTemp(bool interXact)
+{
+	static char *buff = NULL;
+	static int allocated_for_compression = TEMP_NONE_COMPRESSION;
+	static int allocated_size = 0;
+	BufFile    *tmpBufFile = BufFileCreateTemp(interXact, true);
+
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		int			size = 0;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+				size = LZ4_compressBound(BLCKSZ) + sizeof(int);
+#endif
+				break;
+			case TEMP_PGLZ_COMPRESSION:
+				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ) + 2 * sizeof(int);
+				break;
+		}
+
+		/*
+		 * Allocate or reallocate buffer if needed:
+		 * - Buffer is NULL (first time)
+		 * - Compression type changed
+		 * - Current buffer is too small
+		 */
+		if (buff == NULL || 
+			allocated_for_compression != temp_file_compression ||
+			allocated_size < size)
+		{
+			if (buff != NULL)
+				pfree(buff);
+			
+			/*
+			 * Persistent buffer for all temporary file compressions
+			 */
+			buff = MemoryContextAlloc(TopMemoryContext, size);
+			allocated_for_compression = temp_file_compression;
+			allocated_size = size;
+		}
+	}
+	tmpBufFile->cBuffer = buff;
+	return tmpBufFile;
+}
+
 /*
  * Build the name for a given segment of a given BufFile.
  */
@@ -454,21 +540,131 @@ BufFileLoadBuffer(BufFile *file)
 	else
 		INSTR_TIME_SET_ZERO(io_start);
 
+	if (!file->compress)
+	{
+
+		/*
+		* Read whatever we can get, up to a full bufferload.
+		*/
+		file->nbytes = FileRead(thisfile,
+								file->buffer.data,
+								sizeof(file->buffer),
+								file->curOffset,
+								WAIT_EVENT_BUFFILE_READ);
+		if (file->nbytes < 0)
+		{
+			file->nbytes = 0;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							FilePathName(thisfile))));
+		}
 	/*
-	 * Read whatever we can get, up to a full bufferload.
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
 	 */
-	file->nbytes = FileRead(thisfile,
-							file->buffer.data,
-							sizeof(file->buffer.data),
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
-	if (file->nbytes < 0)
-	{
-		file->nbytes = 0;
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\": %m",
-						FilePathName(thisfile))));
+		
+		/* Check if first read succeeded */
+		if (nread != sizeof(nbytes) && nread > 0)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg_internal("first read is broken")));
+		}
+		
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/* A long life buffer limits number of memory allocations */
+			char * buff = file->cBuffer;
+			int original_size = 0;
+			int header_advance = sizeof(nbytes);
+
+			Assert(file->cBuffer != NULL);
+			
+			/* For PGLZ, read additional original size */
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+				int nread_orig = FileRead(thisfile,
+							&original_size,
+							sizeof(original_size),
+							file->curOffset + sizeof(nbytes),
+							WAIT_EVENT_BUFFILE_READ);
+				
+				/* Check if second read succeeded */
+				if (nread_orig != sizeof(original_size) && nread_orig > 0) {
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes", 
+							 				 (int)sizeof(original_size), nread_orig)));
+				}
+				
+				if (nread_orig <= 0) {
+					file->nbytes = 0;
+					return;
+				}
+				
+				/* Check if data is uncompressed (marker = -1) */
+				if (original_size == -1) {
+					/* Uncompressed data: read directly into buffer */
+					file->curOffset += 2 * sizeof(int);  /* Skip both header fields */
+					int nread_data = FileRead(thisfile,
+											file->buffer.data,
+											nbytes,  /* nbytes contains original size */
+											file->curOffset,
+											WAIT_EVENT_BUFFILE_READ);
+					file->nbytes = nread_data;
+					file->curOffset += nread_data;
+					return;
+				}
+				
+				header_advance = 2 * sizeof(int);
+			}
+			
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset += header_advance;
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+
+			switch (temp_file_compression)
+			{
+				case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+					file->nbytes = LZ4_decompress_safe(buff,
+						file->buffer.data,nbytes,sizeof(file->buffer));
+#endif
+					break;
+
+							case TEMP_PGLZ_COMPRESSION:
+				file->nbytes = pglz_decompress(buff,nbytes,
+					file->buffer.data,original_size,false);
+				break;
+			}
+			file->curOffset += nread;
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+		}
+
 	}
 
 	if (track_io_timing)
@@ -496,6 +692,75 @@ BufFileDumpBuffer(BufFile *file)
 	int			wpos = 0;
 	int			bytestowrite;
 	File		thisfile;
+	char	   *DataToWrite = file->buffer.data;
+	int			nbytesOriginal = file->nbytes;
+
+	/*
+	 * Compression logic: compress the buffer data if compression is enabled
+	 */
+	if (file->compress)
+	{
+		char	   *cData;
+		int			cSize = 0;
+
+		Assert(file->cBuffer != NULL);
+		cData = file->cBuffer;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+				{
+#ifdef USE_LZ4
+					int			cBufferSize = LZ4_compressBound(file->nbytes);
+
+					/*
+					 * Using stream compression would lead to the slight
+					 * improvement in compression ratio
+					 */
+					cSize = LZ4_compress_default(file->buffer.data,
+												 cData + sizeof(int), file->nbytes, cBufferSize);
+#endif
+					break;
+				}
+			case TEMP_PGLZ_COMPRESSION:
+				cSize = pglz_compress(file->buffer.data, file->nbytes,
+									  cData + 2 * sizeof(int), PGLZ_strategy_always);
+				break;
+		}
+
+		/* Check if compression was successful */
+		if (cSize <= 0) {
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+				/* PGLZ compression failed, store uncompressed data with -1 marker */
+				memcpy(cData, &nbytesOriginal, sizeof(int));  /* First field: original size */
+				int marker = -1;  /* Second field: -1 = uncompressed marker */
+				memcpy(cData + sizeof(int), &marker, sizeof(int));
+				memcpy(cData + 2 * sizeof(int), file->buffer.data, nbytesOriginal);
+				file->nbytes = nbytesOriginal + 2 * sizeof(int);
+				DataToWrite = cData;
+			} else {
+				/* LZ4 compression failed, report error */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d", 
+						 				 cSize, nbytesOriginal)));
+			}
+		} else {
+			/*
+			 * Write header in front of compressed data
+			 * LZ4 format: [compressed_size:int][compressed_data]
+			 * PGLZ format: [compressed_size:int][original_size:int][compressed_data]
+			 */
+			memcpy(cData, &cSize, sizeof(int));
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+				memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
+				file->nbytes = cSize + 2 * sizeof(int);
+			} else {
+				file->nbytes = cSize + sizeof(int);
+			}
+			DataToWrite = cData;
+		}
+	}
 
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
@@ -535,7 +800,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +829,15 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else if (nbytesOriginal - file->pos != 0)
+		/*
+		 * curOffset must be corrected also if compression is enabled, nbytes
+		 * was changed by compression but we have to use the original value of
+		 * nbytes
+		 */
+		file->curOffset -= bytestowrite;
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -602,8 +875,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index d14b1678e7f..52281dcbb98 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -78,6 +78,7 @@
 #include "replication/syncrep.h"
 #include "storage/aio.h"
 #include "storage/bufmgr.h"
+#include "storage/buffile.h"
 #include "storage/bufpage.h"
 #include "storage/copydir.h"
 #include "storage/io_worker.h"
@@ -464,6 +465,18 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * pglz and zstd support should be added as future enhancement
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+	{"pglz", TEMP_PGLZ_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
+
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
 #ifdef USE_LZ4
@@ -5058,6 +5071,17 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"temp_file_compression", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the default compression method for temporary files."),
+			NULL
+		},
+		&temp_file_compression,
+		TEMP_NONE_COMPRESSION,
+		temp_file_compression_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"default_transaction_isolation", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Sets the transaction isolation level of each new transaction."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..017f4bdac37 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -182,6 +182,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'		# enables temporary files compression
 
 # - Kernel Resources -
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index e529ceb8260..d862e22ef18 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index c9aecab8d66..ef85924cd21 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index a2f4821f240..57908dd5462 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,21 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+	TEMP_PGLZ_COMPRESSION,
+	TEMP_LZ4_COMPRESSION
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
+extern BufFile *BufFileCreateCompressTemp(bool interXact);
 extern void BufFileClose(BufFile *file);
 pg_nodiscard extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
-- 
2.39.5 (Apple Git-154)

#17

Filip Janus

fjanus@redhat.com

5 months ago

In reply to: Filip Janus (#16)

2 attachment(s)

Re: Proposal: Adding compression of temporary files

Fix overlooked compiler warnings

-Filip-

po 18. 8. 2025 v 18:51 odesílatel Filip Janus <fjanus@redhat.com> napsal:

Show quoted text

I rebased the proposal and fixed the problem causing those problems.

-Filip-

út 17. 6. 2025 v 16:49 odesílatel Andres Freund <andres@anarazel.de>
napsal:

Hi,

On 2025-04-25 23:54:00 +0200, Filip Janus wrote:

The latest rebase.

This often seems to fail during tests:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5382

E.g.

https://api.cirrus-ci.com/v1/artifact/task/4667337632120832/testrun/build-32/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress

=== dumping
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/regression.diffs
===
diff -U3
/tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out
2025-05-26 05:04:40.686524215 +0000
+++
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out
2025-05-26 05:15:00.534907680 +0000
@@ -594,11 +594,8 @@
select count(*) from join_foo
left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using
(id)) ss
on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
- count
--------
-     3
-(1 row)
-
+ERROR:  could not read from temporary file: read only 8180 of 1572860
bytes
+CONTEXT:  parallel worker
select final > 1 as multibatch
from hash_join_batches(
$$
@@ -606,11 +603,7 @@
left join (select b1.id, b1.t from join_bar b1 join join_bar b2
using (id)) ss
on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
$$);
- multibatch
-------------
- t
-(1 row)
-
+ERROR:  current transaction is aborted, commands ignored until end of
transaction block
rollback to settings;
-- single-batch with rescan, parallel-oblivious
savepoint settings;

Greetings,

Andres

Attachments:

0001-Add-transparent-compression-for-temporary-files.patchapplication/octet-stream; name=0001-Add-transparent-compression-for-temporary-files.patchDownload

From 139dd9cf3245330e4da8a8749aa81b35a719db18 Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 31 Jul 2025 14:02:16 +0200
Subject: [PATCH 1/2] Add transparent compression for temporary files

This commit implements transparent compression for temporary files in PostgreSQL,
specifically designed for hash join operations that spill to disk.

Features:
- Support for LZ4 and PGLZ compression algorithms
- GUC parameter 'temp_file_compression' to control compression
- Transparent compression/decompression in BufFile layer
- Shared compression buffer to minimize memory allocation
- Hash join integration using BufFileCreateCompressTemp()

The compression is applied automatically when temp_file_compression is enabled,
with no changes required to calling code. Only hash joins use compression
currently, with seeking limited to rewinding to start.

Configuration options:
- temp_file_compression = 'no' (default)
- temp_file_compression = 'pglz'
- temp_file_compression = 'lz4' (requires --with-lz4)
---
 src/Makefile.global.in                        |   1 +
 src/backend/access/gist/gistbuildbuffers.c    |   2 +-
 src/backend/backup/backup_manifest.c          |   2 +-
 src/backend/executor/nodeHashjoin.c           |   2 +-
 src/backend/storage/file/buffile.c            | 317 +++++++++++++++++-
 src/backend/utils/misc/guc_tables.c           |  24 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/sort/logtape.c              |   2 +-
 src/backend/utils/sort/tuplestore.c           |   2 +-
 src/include/storage/buffile.h                 |  12 +-
 10 files changed, 342 insertions(+), 23 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 8b1b357beaa..6e2ce2948ea 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -201,6 +201,7 @@ with_liburing	= @with_liburing@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
 with_llvm	= @with_llvm@
+with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 0707254d18e..9cc371f47fe 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index d05252f383c..35d088db0f3 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5661ad76830..384265ca74a 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateCompressTemp(false);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 366d70d38a1..3cb3b4fcbb7 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -53,6 +53,17 @@
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "utils/resowner.h"
+#include "utils/memutils.h"
+#include "common/pg_lzcompress.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+/* Compression types */
+#define TEMP_NONE_COMPRESSION  0
+#define TEMP_PGLZ_COMPRESSION  1
+#define TEMP_LZ4_COMPRESSION   2
 
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
@@ -62,6 +73,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -101,6 +114,10 @@ struct BufFile
 	 * wasting per-file alignment padding when some users create many files.
 	 */
 	PGAlignedBlock buffer;
+
+	bool		compress_tempfile; /* transparent compression mode */
+	bool		compress; /* State of usage file compression */
+	char		*cBuffer; /* compression buffer */
 };
 
 static BufFile *makeBufFileCommon(int nfiles);
@@ -127,6 +144,9 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress_tempfile = false;
+	file->compress = false;
+	file->cBuffer = NULL;
 
 	return file;
 }
@@ -188,9 +208,16 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
+ *
+ * If compress is true the temporary files will be compressed before
+ * writing on disk.
+ *
+ * Note: The compression does not support random access. Only the hash joins
+ * use it for now. The seek operation other than seek to the beginning of the
+ * buffile will corrupt temporary data offsets.
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,9 +239,68 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		file->compress = compress;
+	}
+
 	return file;
 }
 
+/*
+ * Wrapper for BufFileCreateTemp
+ * We want to limit the number of memory allocations for the compression buffer,
+ * only one buffer for all compression operations is enough
+ */
+BufFile *
+BufFileCreateCompressTemp(bool interXact)
+{
+	static char *buff = NULL;
+	static int allocated_for_compression = TEMP_NONE_COMPRESSION;
+	static int allocated_size = 0;
+	BufFile    *tmpBufFile = BufFileCreateTemp(interXact, true);
+
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		int			size = 0;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+				size = LZ4_compressBound(BLCKSZ) + sizeof(int);
+#endif
+				break;
+			case TEMP_PGLZ_COMPRESSION:
+				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ) + 2 * sizeof(int);
+				break;
+		}
+
+		/*
+		 * Allocate or reallocate buffer if needed:
+		 * - Buffer is NULL (first time)
+		 * - Compression type changed
+		 * - Current buffer is too small
+		 */
+		if (buff == NULL || 
+			allocated_for_compression != temp_file_compression ||
+			allocated_size < size)
+		{
+			if (buff != NULL)
+				pfree(buff);
+			
+			/*
+			 * Persistent buffer for all temporary file compressions
+			 */
+			buff = MemoryContextAlloc(TopMemoryContext, size);
+			allocated_for_compression = temp_file_compression;
+			allocated_size = size;
+		}
+	}
+	tmpBufFile->cBuffer = buff;
+	return tmpBufFile;
+}
+
 /*
  * Build the name for a given segment of a given BufFile.
  */
@@ -454,21 +540,133 @@ BufFileLoadBuffer(BufFile *file)
 	else
 		INSTR_TIME_SET_ZERO(io_start);
 
+	if (!file->compress)
+	{
+
+		/*
+		* Read whatever we can get, up to a full bufferload.
+		*/
+		file->nbytes = FileRead(thisfile,
+								file->buffer.data,
+								sizeof(file->buffer),
+								file->curOffset,
+								WAIT_EVENT_BUFFILE_READ);
+		if (file->nbytes < 0)
+		{
+			file->nbytes = 0;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							FilePathName(thisfile))));
+		}
 	/*
-	 * Read whatever we can get, up to a full bufferload.
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
 	 */
-	file->nbytes = FileRead(thisfile,
-							file->buffer.data,
-							sizeof(file->buffer.data),
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
-	if (file->nbytes < 0)
-	{
-		file->nbytes = 0;
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\": %m",
-						FilePathName(thisfile))));
+		
+		/* Check if first read succeeded */
+		if (nread != sizeof(nbytes) && nread > 0)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg_internal("first read is broken")));
+		}
+		
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/* A long life buffer limits number of memory allocations */
+			char * buff = file->cBuffer;
+			int original_size = 0;
+			int header_advance = sizeof(nbytes);
+
+			Assert(file->cBuffer != NULL);
+
+			/* For PGLZ, read additional original size */
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+				int nread_orig = FileRead(thisfile,
+							&original_size,
+							sizeof(original_size),
+							file->curOffset + sizeof(nbytes),
+							WAIT_EVENT_BUFFILE_READ);
+
+				/* Check if second read succeeded */
+				if (nread_orig != sizeof(original_size) && nread_orig > 0) {
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes", 
+							 				 (int)sizeof(original_size), nread_orig)));
+				}
+
+				if (nread_orig <= 0) {
+					file->nbytes = 0;
+					return;
+				}
+
+				/* Check if data is uncompressed (marker = -1) */
+				if (original_size == -1) {
+
+                    int nread_data = 0;
+					/* Uncompressed data: read directly into buffer */
+					file->curOffset += 2 * sizeof(int);  /* Skip both header fields */
+					nread_data = FileRead(thisfile,
+											file->buffer.data,
+											nbytes,  /* nbytes contains original size */
+											file->curOffset,
+											WAIT_EVENT_BUFFILE_READ);
+					file->nbytes = nread_data;
+					file->curOffset += nread_data;
+					return;
+				}
+
+				header_advance = 2 * sizeof(int);
+			}
+
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset += header_advance;
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+
+			switch (temp_file_compression)
+			{
+				case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+					file->nbytes = LZ4_decompress_safe(buff,
+						file->buffer.data,nbytes,sizeof(file->buffer));
+#endif
+					break;
+
+							case TEMP_PGLZ_COMPRESSION:
+				file->nbytes = pglz_decompress(buff,nbytes,
+					file->buffer.data,original_size,false);
+				break;
+			}
+			file->curOffset += nread;
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+		}
+
 	}
 
 	if (track_io_timing)
@@ -494,8 +692,79 @@ static void
 BufFileDumpBuffer(BufFile *file)
 {
 	int			wpos = 0;
-	int			bytestowrite;
+	int			bytestowrite = 0;
 	File		thisfile;
+	char	   *DataToWrite = file->buffer.data;
+	int			nbytesOriginal = file->nbytes;
+
+	/*
+	 * Compression logic: compress the buffer data if compression is enabled
+	 */
+	if (file->compress)
+	{
+		char	   *cData;
+		int			cSize = 0;
+
+		Assert(file->cBuffer != NULL);
+		cData = file->cBuffer;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+				{
+#ifdef USE_LZ4
+					int			cBufferSize = LZ4_compressBound(file->nbytes);
+
+					/*
+					 * Using stream compression would lead to the slight
+					 * improvement in compression ratio
+					 */
+					cSize = LZ4_compress_default(file->buffer.data,
+												 cData + sizeof(int), file->nbytes, cBufferSize);
+#endif
+					break;
+				}
+			case TEMP_PGLZ_COMPRESSION:
+				cSize = pglz_compress(file->buffer.data, file->nbytes,
+									  cData + 2 * sizeof(int), PGLZ_strategy_always);
+				break;
+		}
+
+		/* Check if compression was successful */
+		if (cSize <= 0) {
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+
+                int marker;
+				/* PGLZ compression failed, store uncompressed data with -1 marker */
+				memcpy(cData, &nbytesOriginal, sizeof(int));  /* First field: original size */
+				marker = -1;  /* Second field: -1 = uncompressed marker */
+				memcpy(cData + sizeof(int), &marker, sizeof(int));
+				memcpy(cData + 2 * sizeof(int), file->buffer.data, nbytesOriginal);
+				file->nbytes = nbytesOriginal + 2 * sizeof(int);
+				DataToWrite = cData;
+			} else {
+				/* LZ4 compression failed, report error */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d", 
+						 				 cSize, nbytesOriginal)));
+			}
+		} else {
+			/*
+			 * Write header in front of compressed data
+			 * LZ4 format: [compressed_size:int][compressed_data]
+			 * PGLZ format: [compressed_size:int][original_size:int][compressed_data]
+			 */
+			memcpy(cData, &cSize, sizeof(int));
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+				memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
+				file->nbytes = cSize + 2 * sizeof(int);
+			} else {
+				file->nbytes = cSize + sizeof(int);
+			}
+			DataToWrite = cData;
+		}
+	}
 
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
@@ -535,7 +804,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +833,15 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else if (nbytesOriginal - file->pos != 0)
+		/*
+		 * curOffset must be corrected also if compression is enabled, nbytes
+		 * was changed by compression but we have to use the original value of
+		 * nbytes
+		 */
+		file->curOffset -= bytestowrite;
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -602,8 +879,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index d14b1678e7f..52281dcbb98 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -78,6 +78,7 @@
 #include "replication/syncrep.h"
 #include "storage/aio.h"
 #include "storage/bufmgr.h"
+#include "storage/buffile.h"
 #include "storage/bufpage.h"
 #include "storage/copydir.h"
 #include "storage/io_worker.h"
@@ -464,6 +465,18 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * pglz and zstd support should be added as future enhancement
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+	{"pglz", TEMP_PGLZ_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
+
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
 #ifdef USE_LZ4
@@ -5058,6 +5071,17 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"temp_file_compression", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the default compression method for temporary files."),
+			NULL
+		},
+		&temp_file_compression,
+		TEMP_NONE_COMPRESSION,
+		temp_file_compression_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"default_transaction_isolation", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Sets the transaction isolation level of each new transaction."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..017f4bdac37 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -182,6 +182,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'		# enables temporary files compression
 
 # - Kernel Resources -
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index e529ceb8260..d862e22ef18 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index c9aecab8d66..ef85924cd21 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index a2f4821f240..57908dd5462 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,21 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+	TEMP_PGLZ_COMPRESSION,
+	TEMP_LZ4_COMPRESSION
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
+extern BufFile *BufFileCreateCompressTemp(bool interXact);
 extern void BufFileClose(BufFile *file);
 pg_nodiscard extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
-- 
2.39.5 (Apple Git-154)

0002-Add-regression-tests-for-temporary-file-compression.patchapplication/octet-stream; name=0002-Add-regression-tests-for-temporary-file-compression.patchDownload

From 0dec17f2768e25030c7dca810ad9f103e2b4aa13 Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 31 Jul 2025 14:02:45 +0200
Subject: [PATCH 2/2] Add regression tests for temporary file compression

This commit adds comprehensive regression tests for the transparent
temporary file compression feature.

Test coverage:
- join_hash_lz4.sql: Tests hash join operations with LZ4 compression
- join_hash_pglz.sql: Tests hash join operations with PGLZ compression
- Both tests verify compression works correctly for various hash join scenarios
- Expected output files for validation

Test integration:
- LZ4 tests are conditionally enabled when PostgreSQL is built with --with-lz4
- PGLZ tests are always enabled as PGLZ is built-in
- Tests added to parallel regression test schedule
- GNUmakefile updated to include conditional LZ4 test execution

The tests ensure that compression/decompression works transparently
without affecting query results, while providing coverage for both
supported compression algorithms.
---
 src/test/regress/GNUmakefile                 |    4 +
 src/test/regress/expected/join_hash_lz4.out  | 1166 ++++++++++++++++++
 src/test/regress/expected/join_hash_pglz.out | 1166 ++++++++++++++++++
 src/test/regress/parallel_schedule           |    4 +-
 src/test/regress/sql/join_hash_lz4.sql       |  626 ++++++++++
 src/test/regress/sql/join_hash_pglz.sql      |  626 ++++++++++
 6 files changed, 3591 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/join_hash_lz4.out
 create mode 100644 src/test/regress/expected/join_hash_pglz.out
 create mode 100644 src/test/regress/sql/join_hash_lz4.sql
 create mode 100644 src/test/regress/sql/join_hash_pglz.sql

diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index ef2bddf42ca..94df5649e34 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -94,6 +94,10 @@ installdirs-tests: installdirs
 REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 \
 	$(EXTRA_REGRESS_OPTS)
 
+ifeq ($(with_lz4),yes)
+override EXTRA_TESTS := $(EXTRA_TESTS) join_hash_lz4
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/expected/join_hash_lz4.out b/src/test/regress/expected/join_hash_lz4.out
new file mode 100644
index 00000000000..966a5cd8f55
--- /dev/null
+++ b/src/test/regress/expected/join_hash_lz4.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/expected/join_hash_pglz.out b/src/test/regress/expected/join_hash_pglz.out
new file mode 100644
index 00000000000..99c67f982af
--- /dev/null
+++ b/src/test/regress/expected/join_hash_pglz.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..d62d44814ef 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -15,7 +15,6 @@ test: test_setup
 # The first group of parallel tests
 # ----------
 test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
-
 # ----------
 # The second group of parallel tests
 # multirangetypes depends on rangetypes
@@ -140,3 +139,6 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# this test is equivalent to join_hash test just the compression is enabled
+test: join_hash_pglz
diff --git a/src/test/regress/sql/join_hash_lz4.sql b/src/test/regress/sql/join_hash_lz4.sql
new file mode 100644
index 00000000000..1d19c1980e1
--- /dev/null
+++ b/src/test/regress/sql/join_hash_lz4.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
diff --git a/src/test/regress/sql/join_hash_pglz.sql b/src/test/regress/sql/join_hash_pglz.sql
new file mode 100644
index 00000000000..2686afab272
--- /dev/null
+++ b/src/test/regress/sql/join_hash_pglz.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.39.5 (Apple Git-154)

#18

Filip Janus

fjanus@redhat.com

4 months ago

In reply to: Filip Janus (#17)

2 attachment(s)

Re: Proposal: Adding compression of temporary files

Rebase after changes introduced in guc_tables.c

-Filip-

út 19. 8. 2025 v 17:48 odesílatel Filip Janus <fjanus@redhat.com> napsal:

Show quoted text

Fix overlooked compiler warnings

-Filip-

po 18. 8. 2025 v 18:51 odesílatel Filip Janus <fjanus@redhat.com> napsal:
I rebased the proposal and fixed the problem causing those problems.

-Filip-

út 17. 6. 2025 v 16:49 odesílatel Andres Freund <andres@anarazel.de>
napsal:
Hi,

On 2025-04-25 23:54:00 +0200, Filip Janus wrote:

The latest rebase.

This often seems to fail during tests:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5382

E.g.

https://api.cirrus-ci.com/v1/artifact/task/4667337632120832/testrun/build-32/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress
=== dumping
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/regression.diffs
===
diff -U3
/tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out
2025-05-26 05:04:40.686524215 +0000
+++
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out
2025-05-26 05:15:00.534907680 +0000
@@ -594,11 +594,8 @@
select count(*) from join_foo
left join (select b1.id, b1.t from join_bar b1 join join_bar b2
using (id)) ss
on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
- count
--------
-     3
-(1 row)
-
+ERROR:  could not read from temporary file: read only 8180 of 1572860
bytes
+CONTEXT:  parallel worker
select final > 1 as multibatch
from hash_join_batches(
$$
@@ -606,11 +603,7 @@
left join (select b1.id, b1.t from join_bar b1 join join_bar b2
using (id)) ss
on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
$$);
- multibatch
-------------
- t
-(1 row)
-
+ERROR:  current transaction is aborted, commands ignored until end of
transaction block
rollback to settings;
-- single-batch with rescan, parallel-oblivious
savepoint settings;
Greetings,

Andres

Attachments:

0001-Add-transparent-compression-for-temporary-files.patchapplication/octet-stream; name=0001-Add-transparent-compression-for-temporary-files.patchDownload

From 63e0639c36657c0d93bd0ec7d99b0e7dca0af2cc Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 31 Jul 2025 14:02:16 +0200
Subject: [PATCH 1/2] Add transparent compression for temporary files

This commit implements transparent compression for temporary files in PostgreSQL,
specifically designed for hash join operations that spill to disk.

Features:
- Support for LZ4 and PGLZ compression algorithms
- GUC parameter 'temp_file_compression' to control compression
- Transparent compression/decompression in BufFile layer
- Shared compression buffer to minimize memory allocation
- Hash join integration using BufFileCreateCompressTemp()

The compression is applied automatically when temp_file_compression is enabled,
with no changes required to calling code. Only hash joins use compression
currently, with seeking limited to rewinding to start.

Configuration options:
- temp_file_compression = 'no' (default)
- temp_file_compression = 'pglz'
- temp_file_compression = 'lz4' (requires --with-lz4)

Fix GUC tables structure for compression support
---
 src/Makefile.global.in                        |   1 +
 src/backend/access/gist/gistbuildbuffers.c    |   2 +-
 src/backend/backup/backup_manifest.c          |   2 +-
 src/backend/executor/nodeHashjoin.c           |   2 +-
 src/backend/storage/file/buffile.c            | 317 +++++++++++++++++-
 src/backend/utils/misc/guc_parameters.dat     |   7 +
 src/backend/utils/misc/guc_tables.c           |  13 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/sort/logtape.c              |   2 +-
 src/backend/utils/sort/tuplestore.c           |   2 +-
 src/include/storage/buffile.h                 |  12 +-
 11 files changed, 338 insertions(+), 23 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 0aa389bc710..3a8b277a9ae 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -201,6 +201,7 @@ with_liburing	= @with_liburing@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
 with_llvm	= @with_llvm@
+with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 0707254d18e..9cc371f47fe 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index d05252f383c..35d088db0f3 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5661ad76830..384265ca74a 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateCompressTemp(false);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 366d70d38a1..3cb3b4fcbb7 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -53,6 +53,17 @@
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "utils/resowner.h"
+#include "utils/memutils.h"
+#include "common/pg_lzcompress.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+/* Compression types */
+#define TEMP_NONE_COMPRESSION  0
+#define TEMP_PGLZ_COMPRESSION  1
+#define TEMP_LZ4_COMPRESSION   2
 
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
@@ -62,6 +73,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -101,6 +114,10 @@ struct BufFile
 	 * wasting per-file alignment padding when some users create many files.
 	 */
 	PGAlignedBlock buffer;
+
+	bool		compress_tempfile; /* transparent compression mode */
+	bool		compress; /* State of usage file compression */
+	char		*cBuffer; /* compression buffer */
 };
 
 static BufFile *makeBufFileCommon(int nfiles);
@@ -127,6 +144,9 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress_tempfile = false;
+	file->compress = false;
+	file->cBuffer = NULL;
 
 	return file;
 }
@@ -188,9 +208,16 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
+ *
+ * If compress is true the temporary files will be compressed before
+ * writing on disk.
+ *
+ * Note: The compression does not support random access. Only the hash joins
+ * use it for now. The seek operation other than seek to the beginning of the
+ * buffile will corrupt temporary data offsets.
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,9 +239,68 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		file->compress = compress;
+	}
+
 	return file;
 }
 
+/*
+ * Wrapper for BufFileCreateTemp
+ * We want to limit the number of memory allocations for the compression buffer,
+ * only one buffer for all compression operations is enough
+ */
+BufFile *
+BufFileCreateCompressTemp(bool interXact)
+{
+	static char *buff = NULL;
+	static int allocated_for_compression = TEMP_NONE_COMPRESSION;
+	static int allocated_size = 0;
+	BufFile    *tmpBufFile = BufFileCreateTemp(interXact, true);
+
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		int			size = 0;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+				size = LZ4_compressBound(BLCKSZ) + sizeof(int);
+#endif
+				break;
+			case TEMP_PGLZ_COMPRESSION:
+				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ) + 2 * sizeof(int);
+				break;
+		}
+
+		/*
+		 * Allocate or reallocate buffer if needed:
+		 * - Buffer is NULL (first time)
+		 * - Compression type changed
+		 * - Current buffer is too small
+		 */
+		if (buff == NULL || 
+			allocated_for_compression != temp_file_compression ||
+			allocated_size < size)
+		{
+			if (buff != NULL)
+				pfree(buff);
+			
+			/*
+			 * Persistent buffer for all temporary file compressions
+			 */
+			buff = MemoryContextAlloc(TopMemoryContext, size);
+			allocated_for_compression = temp_file_compression;
+			allocated_size = size;
+		}
+	}
+	tmpBufFile->cBuffer = buff;
+	return tmpBufFile;
+}
+
 /*
  * Build the name for a given segment of a given BufFile.
  */
@@ -454,21 +540,133 @@ BufFileLoadBuffer(BufFile *file)
 	else
 		INSTR_TIME_SET_ZERO(io_start);
 
+	if (!file->compress)
+	{
+
+		/*
+		* Read whatever we can get, up to a full bufferload.
+		*/
+		file->nbytes = FileRead(thisfile,
+								file->buffer.data,
+								sizeof(file->buffer),
+								file->curOffset,
+								WAIT_EVENT_BUFFILE_READ);
+		if (file->nbytes < 0)
+		{
+			file->nbytes = 0;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							FilePathName(thisfile))));
+		}
 	/*
-	 * Read whatever we can get, up to a full bufferload.
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
 	 */
-	file->nbytes = FileRead(thisfile,
-							file->buffer.data,
-							sizeof(file->buffer.data),
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
-	if (file->nbytes < 0)
-	{
-		file->nbytes = 0;
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\": %m",
-						FilePathName(thisfile))));
+		
+		/* Check if first read succeeded */
+		if (nread != sizeof(nbytes) && nread > 0)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg_internal("first read is broken")));
+		}
+		
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/* A long life buffer limits number of memory allocations */
+			char * buff = file->cBuffer;
+			int original_size = 0;
+			int header_advance = sizeof(nbytes);
+
+			Assert(file->cBuffer != NULL);
+
+			/* For PGLZ, read additional original size */
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+				int nread_orig = FileRead(thisfile,
+							&original_size,
+							sizeof(original_size),
+							file->curOffset + sizeof(nbytes),
+							WAIT_EVENT_BUFFILE_READ);
+
+				/* Check if second read succeeded */
+				if (nread_orig != sizeof(original_size) && nread_orig > 0) {
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes", 
+							 				 (int)sizeof(original_size), nread_orig)));
+				}
+
+				if (nread_orig <= 0) {
+					file->nbytes = 0;
+					return;
+				}
+
+				/* Check if data is uncompressed (marker = -1) */
+				if (original_size == -1) {
+
+                    int nread_data = 0;
+					/* Uncompressed data: read directly into buffer */
+					file->curOffset += 2 * sizeof(int);  /* Skip both header fields */
+					nread_data = FileRead(thisfile,
+											file->buffer.data,
+											nbytes,  /* nbytes contains original size */
+											file->curOffset,
+											WAIT_EVENT_BUFFILE_READ);
+					file->nbytes = nread_data;
+					file->curOffset += nread_data;
+					return;
+				}
+
+				header_advance = 2 * sizeof(int);
+			}
+
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset += header_advance;
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+
+			switch (temp_file_compression)
+			{
+				case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+					file->nbytes = LZ4_decompress_safe(buff,
+						file->buffer.data,nbytes,sizeof(file->buffer));
+#endif
+					break;
+
+							case TEMP_PGLZ_COMPRESSION:
+				file->nbytes = pglz_decompress(buff,nbytes,
+					file->buffer.data,original_size,false);
+				break;
+			}
+			file->curOffset += nread;
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+		}
+
 	}
 
 	if (track_io_timing)
@@ -494,8 +692,79 @@ static void
 BufFileDumpBuffer(BufFile *file)
 {
 	int			wpos = 0;
-	int			bytestowrite;
+	int			bytestowrite = 0;
 	File		thisfile;
+	char	   *DataToWrite = file->buffer.data;
+	int			nbytesOriginal = file->nbytes;
+
+	/*
+	 * Compression logic: compress the buffer data if compression is enabled
+	 */
+	if (file->compress)
+	{
+		char	   *cData;
+		int			cSize = 0;
+
+		Assert(file->cBuffer != NULL);
+		cData = file->cBuffer;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+				{
+#ifdef USE_LZ4
+					int			cBufferSize = LZ4_compressBound(file->nbytes);
+
+					/*
+					 * Using stream compression would lead to the slight
+					 * improvement in compression ratio
+					 */
+					cSize = LZ4_compress_default(file->buffer.data,
+												 cData + sizeof(int), file->nbytes, cBufferSize);
+#endif
+					break;
+				}
+			case TEMP_PGLZ_COMPRESSION:
+				cSize = pglz_compress(file->buffer.data, file->nbytes,
+									  cData + 2 * sizeof(int), PGLZ_strategy_always);
+				break;
+		}
+
+		/* Check if compression was successful */
+		if (cSize <= 0) {
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+
+                int marker;
+				/* PGLZ compression failed, store uncompressed data with -1 marker */
+				memcpy(cData, &nbytesOriginal, sizeof(int));  /* First field: original size */
+				marker = -1;  /* Second field: -1 = uncompressed marker */
+				memcpy(cData + sizeof(int), &marker, sizeof(int));
+				memcpy(cData + 2 * sizeof(int), file->buffer.data, nbytesOriginal);
+				file->nbytes = nbytesOriginal + 2 * sizeof(int);
+				DataToWrite = cData;
+			} else {
+				/* LZ4 compression failed, report error */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d", 
+						 				 cSize, nbytesOriginal)));
+			}
+		} else {
+			/*
+			 * Write header in front of compressed data
+			 * LZ4 format: [compressed_size:int][compressed_data]
+			 * PGLZ format: [compressed_size:int][original_size:int][compressed_data]
+			 */
+			memcpy(cData, &cSize, sizeof(int));
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+				memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
+				file->nbytes = cSize + 2 * sizeof(int);
+			} else {
+				file->nbytes = cSize + sizeof(int);
+			}
+			DataToWrite = cData;
+		}
+	}
 
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
@@ -535,7 +804,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +833,15 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else if (nbytesOriginal - file->pos != 0)
+		/*
+		 * curOffset must be corrected also if compression is enabled, nbytes
+		 * was changed by compression but we have to use the original value of
+		 * nbytes
+		 */
+		file->curOffset -= bytestowrite;
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -602,8 +879,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 6bc6be13d2a..399cf903fff 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -3214,6 +3214,13 @@
   options => 'default_toast_compression_options',
 },
 
+{ name => 'temp_file_compression', type => 'enum', context => 'PGC_USERSET', group => 'CLIENT_CONN_STATEMENT',
+  short_desc => 'Sets the default compression method for temporary files.',
+  variable => 'temp_file_compression',
+  boot_val => 'TEMP_NONE_COMPRESSION',
+  options => 'temp_file_compression_options',
+},
+
 { name => 'default_transaction_isolation', type => 'enum', context => 'PGC_USERSET', group => 'CLIENT_CONN_STATEMENT',
   short_desc => 'Sets the transaction isolation level of each new transaction.',
   variable => 'DefaultXactIsoLevel',
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 00c8376cf4d..2fb3891b730 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -78,6 +78,7 @@
 #include "replication/syncrep.h"
 #include "storage/aio.h"
 #include "storage/bufmgr.h"
+#include "storage/buffile.h"
 #include "storage/bufpage.h"
 #include "storage/copydir.h"
 #include "storage/io_worker.h"
@@ -464,6 +465,18 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * pglz and zstd support should be added as future enhancement
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+	{"pglz", TEMP_PGLZ_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
+
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
 #ifdef USE_LZ4
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c36fcb9ab61..f380983d2f2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -182,6 +182,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'		# enables temporary files compression
 
 # - Kernel Resources -
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index e529ceb8260..d862e22ef18 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index c9aecab8d66..ef85924cd21 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index a2f4821f240..57908dd5462 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,21 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+	TEMP_PGLZ_COMPRESSION,
+	TEMP_LZ4_COMPRESSION
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
+extern BufFile *BufFileCreateCompressTemp(bool interXact);
 extern void BufFileClose(BufFile *file);
 pg_nodiscard extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
-- 
2.39.5 (Apple Git-154)

0002-Add-regression-tests-for-temporary-file-compression.patchapplication/octet-stream; name=0002-Add-regression-tests-for-temporary-file-compression.patchDownload

From 7838d59b4f8fd7bd78a68cb9e02ea98e5dc58b7e Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 31 Jul 2025 14:02:45 +0200
Subject: [PATCH 2/2] Add regression tests for temporary file compression

This commit adds comprehensive regression tests for the transparent
temporary file compression feature.

Test coverage:
- join_hash_lz4.sql: Tests hash join operations with LZ4 compression
- join_hash_pglz.sql: Tests hash join operations with PGLZ compression
- Both tests verify compression works correctly for various hash join scenarios
- Expected output files for validation

Test integration:
- LZ4 tests are conditionally enabled when PostgreSQL is built with --with-lz4
- PGLZ tests are always enabled as PGLZ is built-in
- Tests added to parallel regression test schedule
- GNUmakefile updated to include conditional LZ4 test execution

The tests ensure that compression/decompression works transparently
without affecting query results, while providing coverage for both
supported compression algorithms.
---
 src/test/regress/GNUmakefile                 |    4 +
 src/test/regress/expected/join_hash_lz4.out  | 1166 ++++++++++++++++++
 src/test/regress/expected/join_hash_pglz.out | 1166 ++++++++++++++++++
 src/test/regress/parallel_schedule           |    4 +-
 src/test/regress/sql/join_hash_lz4.sql       |  626 ++++++++++
 src/test/regress/sql/join_hash_pglz.sql      |  626 ++++++++++
 6 files changed, 3591 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/join_hash_lz4.out
 create mode 100644 src/test/regress/expected/join_hash_pglz.out
 create mode 100644 src/test/regress/sql/join_hash_lz4.sql
 create mode 100644 src/test/regress/sql/join_hash_pglz.sql

diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index ef2bddf42ca..94df5649e34 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -94,6 +94,10 @@ installdirs-tests: installdirs
 REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 \
 	$(EXTRA_REGRESS_OPTS)
 
+ifeq ($(with_lz4),yes)
+override EXTRA_TESTS := $(EXTRA_TESTS) join_hash_lz4
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/expected/join_hash_lz4.out b/src/test/regress/expected/join_hash_lz4.out
new file mode 100644
index 00000000000..966a5cd8f55
--- /dev/null
+++ b/src/test/regress/expected/join_hash_lz4.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/expected/join_hash_pglz.out b/src/test/regress/expected/join_hash_pglz.out
new file mode 100644
index 00000000000..99c67f982af
--- /dev/null
+++ b/src/test/regress/expected/join_hash_pglz.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..d62d44814ef 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -15,7 +15,6 @@ test: test_setup
 # The first group of parallel tests
 # ----------
 test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
-
 # ----------
 # The second group of parallel tests
 # multirangetypes depends on rangetypes
@@ -140,3 +139,6 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# this test is equivalent to join_hash test just the compression is enabled
+test: join_hash_pglz
diff --git a/src/test/regress/sql/join_hash_lz4.sql b/src/test/regress/sql/join_hash_lz4.sql
new file mode 100644
index 00000000000..1d19c1980e1
--- /dev/null
+++ b/src/test/regress/sql/join_hash_lz4.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
diff --git a/src/test/regress/sql/join_hash_pglz.sql b/src/test/regress/sql/join_hash_pglz.sql
new file mode 100644
index 00000000000..2686afab272
--- /dev/null
+++ b/src/test/regress/sql/join_hash_pglz.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.39.5 (Apple Git-154)

#19

Tomas Vondra

tomas@vondra.me

3 months ago

In reply to: Filip Janus (#18)

22 attachment(s)

Re: Proposal: Adding compression of temporary files

Hello Filip,

Thanks for the updated patch, and for your patience with working on this
patch with (unfortunately) little feedback. I took a look at the patch,
and did some testing. In general, I think it's heading in the right
direction, but there's still a couple issues and open questions.

Attached is a collection of incremental patches with the proposed
changes. I'll briefly explain the motivation for each patch, but it's
easier to share the complete change as a patch. Feel free to disagree
with the changes, some are a matter of opinion, and/or there might be a
better way to do that. Ultimately it should be squashed to the main
patch, or perhaps a couple larger patches.

v20250930-0001-Add-transparent-compression-for-temporary-.patch

- original patch, as posted on 2025/09/26

v20250930-0002-whitespace.patch

- cleanup of whitespace issues
- This is visible in git-show or when applying using git-am.

v20250930-0003-pgindent.patch

- pgindent run, to fix code style (formatting of if/else branches,
indentation, that kind of stuff)
- good to run pgindent every now and then, for consistency

v20250930-0004-Add-regression-tests-for-temporary-file-co.patch

- original patch, as posted on 2025/09/26

v20250930-0005-remove-unused-BufFile-compress_tempfile.patch

- the compress_tempfile was unused, get rid of it

v20250930-0006-simplify-BufFileCreateTemp-interface.patch

- I think the proposed interface (a "compress" flag in BufFileCreateTemp
and then a separate method BufFileCreateCompressTemp) is not great.

- The "compress" flag is a bit pointless, because even if you set it to
"true" you won't get compressed file. In fact, it's fragile - you'll get
broken BufFile without the buffer.

- The patch gets rid of the "compress" flag (so existing callers of
BufFileCreateTemp remain unchanged). BufFileCreateCompressTemp sets the
flag directly, which it can because it's in the same module.

- An alternative would be to keep the flag, do all the compression setup
in BufFileCreateTemp, and get rid of BufFileCreateCompressTemp. Not sure
which is better.

v20250930-0007-improve-BufFileCreateTemp-BufFileCreateCom.patch

- Just improving comments, to document the new stuff (needs a check).

- There are two new XXX comments, with questions. One asks if the
allocation is an issue in practice - is the static buffer worth it? The
other suggests we add an assert protecting against unsupported seeks.

v20250930-0008-BufFileCreateCompressTemp-cleanup-and-comm.patch

- A small BufFileCreateCompressTemp cleanup (better comments, better
variable names, formatting, extra assert, ... mostly cosmetic stuff).

- But this made me realize the 'static buffer' idea is likely flawed, at
least the current code. It does pfree() on the current buffer, but how
does it know if there are other files referencing it? Because it then
stashes the buffer to file->buffer. I haven't tried to reproduce the
issue, nor fixed this, but it seems like it might be a problem if two
files happen to use a different compression method.

v20250930-0009-minor-BufFileLoadBuffer-cleanup.patch

- Small BufFileLoadBuffer cleanup, I don't think it's worth it to have
such detailed error messages. So just use "could not read file" and then
whatever libc appends as %m.

v20250930-0010-BufFileLoadBuffer-simpler-FileRead-handlin.patch
v20250930-0011-BufFileLoadBuffer-simpler-FileRead-handlin.patch

- I was somewhat confused by the FileRead handling in BufFileLoadBuffer,
so these two patches try to improve / clarify it.

- I still don't understand the purpose of the "if (nread_orig <= 0)"
branch removed by the second patch.

v20250930-0012-BufFileLoadBuffer-comment-update.patch

- minor comment tweak

v20250930-0013-BufFileLoadBuffer-simplify-skipping-header.patch

- I found it confusing how the code advanced the offset by first adding
to header_advance, and only then adding to curOffset much later. This
gets rid of that, and just advances curOffset right after each read.

v20250930-0014-BufFileDumpBuffer-cleanup-simplification.patch

- Improve the comments in BufFileDumpBuffer, and simplify the code. This
is somewhat subjective, but I think the code is more readable.

- It temporarily removes the handling of -1 for pglz compression. This
was a mistake, and is fixed by a later patch.

v20250930-0015-BufFileLoadBuffer-comment.patch

- XXX for a comment I don't understand.

v20250930-0016-BufFileLoadBuffer-missing-FileRead-error-h.patch

- Points out a FileRead call missing error handling (there's another one
with the same issue).

v20250930-0017-simplify-the-compression-header.patch

- I came to the conclusion that having one "length" field for lz4 and
two (compressed + raw) for pglz makes the code unnecessarily complex,
without gaining much. So this just adds a "header" struct with both
lengths for all compression algorithms. I think it's cleaner/simpler.

v20250930-0018-undo-unncessary-changes-to-Makefile.patch

- Why did the 0001 patch add this? Maybe it's something we should add
separately, not as part of this patch?

v20250930-0019-enable-compression-for-tuplestore.patch

- Enables compression for tuplestores that don't require random access.
This covers e.g. tuplestores produces by SRF like generate_series, etc.

- I still wonder what would it take to support random access. What if we
remember offsets of each block? We could write that into an uncompressed
file. That'd be 128kB per 1GB, which seems acceptable. Just a thought.

v20250930-0020-remember-compression-method-for-each-file.patch

- The code only tracked bool "compress" flag for each file, and then
determined the algorithm during compression/decompression based on the
GUC variable. But that's incorrect, because the GUC can change during
the file life time. For example, there can be a cursor, anb you can do
SET temp_file_compression between the FETCH calls (see the commit
message for an example).

- So this replaces the flag with storing the actual method.

v20250930-0021-LZ4_compress_default-returns-0-on-error.patch

- The LZ4_compress_default returns 0 in case of error. Probably a bug
introduced by one of my earlier patches.

v20250930-0022-try-LZ4_compress_fast.patch

- Experimental patch, trying a faster LZ4 compression.

So that's what I have at the moment. I'm also doing some testing,
measuring the effect of compression both for trivial queries (based on
generate_series) and more complex ones from TPC-H.

I'll post the complete results when I have that, but the results I've
seen so far show that:

- pglz and lz4 end up with about the same compression ratio (in TPC-H
it's often cutting the temp files in about half)

- lz4 is on par with no compression (it's pretty much within noise),
while pglz is much slower (sometimes ~2x slower)

I wonder how would gzip/zstandard perform. My guess would be that gzip
would be faster than pglz, but still slower than lz4. Zstandard is much
closer to lz4. Would it be possible to have some experimental patches
for gzip/zstd, so that we can try? It'd also validate that the code is
prepared for adding more algorithms in the future.

The other thing I was thinking about was the LZ4 stream compression.
There's a comment suggesting it might compress better, and indeed - when
working on pg_dump compression we saw a huge improvement. Again, would
be great to support have an experimental patch for this, so that we can
evaluate it.

regards

--
Tomas Vondra

Attachments:

v20250930-0007-improve-BufFileCreateTemp-BufFileCreateCom.patchtext/x-patch; charset=UTF-8; name=v20250930-0007-improve-BufFileCreateTemp-BufFileCreateCom.patchDownload

From 020b4a71719b20b1bcd9f5fb84436c24e10f2c80 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 21:17:20 +0200
Subject: [PATCH v20250930 07/22] improve
 BufFileCreateTemp/BufFileCreateCompressTemp comments

---
 src/backend/storage/file/buffile.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 88a1a30e418..6a43ad1ee38 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -206,13 +206,6 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
- *
- * If compress is true the temporary files will be compressed before
- * writing on disk.
- *
- * Note: The compression does not support random access. Only the hash joins
- * use it for now. The seek operation other than seek to the beginning of the
- * buffile will corrupt temporary data offsets.
  */
 BufFile *
 BufFileCreateTemp(bool interXact)
@@ -241,9 +234,26 @@ BufFileCreateTemp(bool interXact)
 }
 
 /*
- * Wrapper for BufFileCreateTemp
- * We want to limit the number of memory allocations for the compression buffer,
- * only one buffer for all compression operations is enough
+ * BufFileCreateCompressTemp
+ *		Create a temporary file with transparent compression.
+ *
+ * The temporary files will use compression, depending on the current value of
+ * temp_file_compression GUC.
+ *
+ * Note: Compressed files do not support random access. A seek operation other
+ * than seek to the beginning of the buffile will corrupt data.
+ *
+ * Note: The compression algorithm is determined by temp_file_compression GUC.
+ * If set to "none" (TEMP_NONE_COMPRESSION), the file is not compressed.
+ *
+ * Note: We want to limit the number of memory allocations for the compression
+ * buffer. A single buffer is enough, compression happens block at a time.
+ *
+ * XXX Is the allocation optimization worth it? Do we want to allocate stuff
+ * in TopMemoryContext?
+ *
+ * XXX We should prevent such silent data corruption, by errorr-ing out after
+ * incompatible seek.
  */
 BufFile *
 BufFileCreateCompressTemp(bool interXact)
-- 
2.51.0

v20250930-0008-BufFileCreateCompressTemp-cleanup-and-comm.patchtext/x-patch; charset=UTF-8; name=v20250930-0008-BufFileCreateCompressTemp-cleanup-and-comm.patchDownload

From 1d8179696b641284e69c5c4cc32d0a436856737e Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 21:21:37 +0200
Subject: [PATCH v20250930 08/22] BufFileCreateCompressTemp - cleanup and
 comments

---
 src/backend/storage/file/buffile.c | 32 ++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 6a43ad1ee38..7cfca79ab5c 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -261,14 +261,12 @@ BufFileCreateCompressTemp(bool interXact)
 	static char *buff = NULL;
 	static int	allocated_for_compression = TEMP_NONE_COMPRESSION;
 	static int	allocated_size = 0;
-	BufFile    *tmpBufFile = BufFileCreateTemp(interXact);
+	BufFile    *file = BufFileCreateTemp(interXact);
 
 	if (temp_file_compression != TEMP_NONE_COMPRESSION)
 	{
 		int			size = 0;
 
-		tmpBufFile->compress = true;
-
 		switch (temp_file_compression)
 		{
 			case TEMP_LZ4_COMPRESSION:
@@ -282,28 +280,40 @@ BufFileCreateCompressTemp(bool interXact)
 		}
 
 		/*
-		 * Allocate or reallocate buffer if needed: - Buffer is NULL (first
-		 * time) - Compression type changed - Current buffer is too small
+		 * Allocate or reallocate buffer if needed - first call, compression
+		 * method changed, or the buffer is too small.
+		 *
+		 * XXX Can the buffer be too small if the method did not change? Does the
+		 * method matter, or just the size?
 		 */
-		if (buff == NULL ||
-			allocated_for_compression != temp_file_compression ||
-			allocated_size < size)
+		if ((buff == NULL) ||
+			(allocated_for_compression != temp_file_compression) ||
+			(allocated_size < size))
 		{
+			/*
+			 * FIXME Isn't this pfree wrong? How do we know the buffer is not
+			 * used by some existing temp file?
+			 */
 			if (buff != NULL)
 				pfree(buff);
 
 			/*
-			 * Persistent buffer for all temporary file compressions
+			 * Persistent buffer for all temporary file compressions.
 			 */
 			buff = MemoryContextAlloc(TopMemoryContext, size);
 			allocated_for_compression = temp_file_compression;
 			allocated_size = size;
 		}
 
-		tmpBufFile->cBuffer = buff;
+		file->compress = true;
+		file->cBuffer = buff;
 	}
 
-	return tmpBufFile;
+	/* compression with buffer, or no compression and no buffer */
+	Assert((!file->compress && file->cBuffer == NULL) ||
+		   (file->compress && file->cBuffer != NULL));
+
+	return file;
 }
 
 /*
-- 
2.51.0

v20250930-0001-Add-transparent-compression-for-temporary-.patchtext/x-patch; charset=UTF-8; name=v20250930-0001-Add-transparent-compression-for-temporary-.patchDownload

From a0e7461cd1dfdbe844b13ed8cd4b7aa24fd4e16c Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 31 Jul 2025 14:02:16 +0200
Subject: [PATCH v20250930 01/22] Add transparent compression for temporary
 files

This commit implements transparent compression for temporary files in PostgreSQL,
specifically designed for hash join operations that spill to disk.

Features:
- Support for LZ4 and PGLZ compression algorithms
- GUC parameter 'temp_file_compression' to control compression
- Transparent compression/decompression in BufFile layer
- Shared compression buffer to minimize memory allocation
- Hash join integration using BufFileCreateCompressTemp()

The compression is applied automatically when temp_file_compression is enabled,
with no changes required to calling code. Only hash joins use compression
currently, with seeking limited to rewinding to start.

Configuration options:
- temp_file_compression = 'no' (default)
- temp_file_compression = 'pglz'
- temp_file_compression = 'lz4' (requires --with-lz4)

Fix GUC tables structure for compression support
---
 src/Makefile.global.in                        |   1 +
 src/backend/access/gist/gistbuildbuffers.c    |   2 +-
 src/backend/backup/backup_manifest.c          |   2 +-
 src/backend/executor/nodeHashjoin.c           |   2 +-
 src/backend/storage/file/buffile.c            | 317 +++++++++++++++++-
 src/backend/utils/misc/guc_parameters.dat     |   7 +
 src/backend/utils/misc/guc_tables.c           |  13 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/sort/logtape.c              |   2 +-
 src/backend/utils/sort/tuplestore.c           |   2 +-
 src/include/storage/buffile.h                 |  12 +-
 11 files changed, 338 insertions(+), 23 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 0aa389bc710..3a8b277a9ae 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -201,6 +201,7 @@ with_liburing	= @with_liburing@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
 with_llvm	= @with_llvm@
+with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 0707254d18e..9cc371f47fe 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index d05252f383c..35d088db0f3 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5661ad76830..384265ca74a 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateCompressTemp(false);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 366d70d38a1..3cb3b4fcbb7 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -53,6 +53,17 @@
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "utils/resowner.h"
+#include "utils/memutils.h"
+#include "common/pg_lzcompress.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+/* Compression types */
+#define TEMP_NONE_COMPRESSION  0
+#define TEMP_PGLZ_COMPRESSION  1
+#define TEMP_LZ4_COMPRESSION   2
 
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
@@ -62,6 +73,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -101,6 +114,10 @@ struct BufFile
 	 * wasting per-file alignment padding when some users create many files.
 	 */
 	PGAlignedBlock buffer;
+
+	bool		compress_tempfile; /* transparent compression mode */
+	bool		compress; /* State of usage file compression */
+	char		*cBuffer; /* compression buffer */
 };
 
 static BufFile *makeBufFileCommon(int nfiles);
@@ -127,6 +144,9 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress_tempfile = false;
+	file->compress = false;
+	file->cBuffer = NULL;
 
 	return file;
 }
@@ -188,9 +208,16 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
+ *
+ * If compress is true the temporary files will be compressed before
+ * writing on disk.
+ *
+ * Note: The compression does not support random access. Only the hash joins
+ * use it for now. The seek operation other than seek to the beginning of the
+ * buffile will corrupt temporary data offsets.
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,9 +239,68 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		file->compress = compress;
+	}
+
 	return file;
 }
 
+/*
+ * Wrapper for BufFileCreateTemp
+ * We want to limit the number of memory allocations for the compression buffer,
+ * only one buffer for all compression operations is enough
+ */
+BufFile *
+BufFileCreateCompressTemp(bool interXact)
+{
+	static char *buff = NULL;
+	static int allocated_for_compression = TEMP_NONE_COMPRESSION;
+	static int allocated_size = 0;
+	BufFile    *tmpBufFile = BufFileCreateTemp(interXact, true);
+
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		int			size = 0;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+				size = LZ4_compressBound(BLCKSZ) + sizeof(int);
+#endif
+				break;
+			case TEMP_PGLZ_COMPRESSION:
+				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ) + 2 * sizeof(int);
+				break;
+		}
+
+		/*
+		 * Allocate or reallocate buffer if needed:
+		 * - Buffer is NULL (first time)
+		 * - Compression type changed
+		 * - Current buffer is too small
+		 */
+		if (buff == NULL || 
+			allocated_for_compression != temp_file_compression ||
+			allocated_size < size)
+		{
+			if (buff != NULL)
+				pfree(buff);
+			
+			/*
+			 * Persistent buffer for all temporary file compressions
+			 */
+			buff = MemoryContextAlloc(TopMemoryContext, size);
+			allocated_for_compression = temp_file_compression;
+			allocated_size = size;
+		}
+	}
+	tmpBufFile->cBuffer = buff;
+	return tmpBufFile;
+}
+
 /*
  * Build the name for a given segment of a given BufFile.
  */
@@ -454,21 +540,133 @@ BufFileLoadBuffer(BufFile *file)
 	else
 		INSTR_TIME_SET_ZERO(io_start);
 
+	if (!file->compress)
+	{
+
+		/*
+		* Read whatever we can get, up to a full bufferload.
+		*/
+		file->nbytes = FileRead(thisfile,
+								file->buffer.data,
+								sizeof(file->buffer),
+								file->curOffset,
+								WAIT_EVENT_BUFFILE_READ);
+		if (file->nbytes < 0)
+		{
+			file->nbytes = 0;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							FilePathName(thisfile))));
+		}
 	/*
-	 * Read whatever we can get, up to a full bufferload.
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
 	 */
-	file->nbytes = FileRead(thisfile,
-							file->buffer.data,
-							sizeof(file->buffer.data),
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
-	if (file->nbytes < 0)
-	{
-		file->nbytes = 0;
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\": %m",
-						FilePathName(thisfile))));
+		
+		/* Check if first read succeeded */
+		if (nread != sizeof(nbytes) && nread > 0)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg_internal("first read is broken")));
+		}
+		
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/* A long life buffer limits number of memory allocations */
+			char * buff = file->cBuffer;
+			int original_size = 0;
+			int header_advance = sizeof(nbytes);
+
+			Assert(file->cBuffer != NULL);
+
+			/* For PGLZ, read additional original size */
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+				int nread_orig = FileRead(thisfile,
+							&original_size,
+							sizeof(original_size),
+							file->curOffset + sizeof(nbytes),
+							WAIT_EVENT_BUFFILE_READ);
+
+				/* Check if second read succeeded */
+				if (nread_orig != sizeof(original_size) && nread_orig > 0) {
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes", 
+							 				 (int)sizeof(original_size), nread_orig)));
+				}
+
+				if (nread_orig <= 0) {
+					file->nbytes = 0;
+					return;
+				}
+
+				/* Check if data is uncompressed (marker = -1) */
+				if (original_size == -1) {
+
+                    int nread_data = 0;
+					/* Uncompressed data: read directly into buffer */
+					file->curOffset += 2 * sizeof(int);  /* Skip both header fields */
+					nread_data = FileRead(thisfile,
+											file->buffer.data,
+											nbytes,  /* nbytes contains original size */
+											file->curOffset,
+											WAIT_EVENT_BUFFILE_READ);
+					file->nbytes = nread_data;
+					file->curOffset += nread_data;
+					return;
+				}
+
+				header_advance = 2 * sizeof(int);
+			}
+
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset += header_advance;
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+
+			switch (temp_file_compression)
+			{
+				case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+					file->nbytes = LZ4_decompress_safe(buff,
+						file->buffer.data,nbytes,sizeof(file->buffer));
+#endif
+					break;
+
+							case TEMP_PGLZ_COMPRESSION:
+				file->nbytes = pglz_decompress(buff,nbytes,
+					file->buffer.data,original_size,false);
+				break;
+			}
+			file->curOffset += nread;
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+		}
+
 	}
 
 	if (track_io_timing)
@@ -494,8 +692,79 @@ static void
 BufFileDumpBuffer(BufFile *file)
 {
 	int			wpos = 0;
-	int			bytestowrite;
+	int			bytestowrite = 0;
 	File		thisfile;
+	char	   *DataToWrite = file->buffer.data;
+	int			nbytesOriginal = file->nbytes;
+
+	/*
+	 * Compression logic: compress the buffer data if compression is enabled
+	 */
+	if (file->compress)
+	{
+		char	   *cData;
+		int			cSize = 0;
+
+		Assert(file->cBuffer != NULL);
+		cData = file->cBuffer;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+				{
+#ifdef USE_LZ4
+					int			cBufferSize = LZ4_compressBound(file->nbytes);
+
+					/*
+					 * Using stream compression would lead to the slight
+					 * improvement in compression ratio
+					 */
+					cSize = LZ4_compress_default(file->buffer.data,
+												 cData + sizeof(int), file->nbytes, cBufferSize);
+#endif
+					break;
+				}
+			case TEMP_PGLZ_COMPRESSION:
+				cSize = pglz_compress(file->buffer.data, file->nbytes,
+									  cData + 2 * sizeof(int), PGLZ_strategy_always);
+				break;
+		}
+
+		/* Check if compression was successful */
+		if (cSize <= 0) {
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+
+                int marker;
+				/* PGLZ compression failed, store uncompressed data with -1 marker */
+				memcpy(cData, &nbytesOriginal, sizeof(int));  /* First field: original size */
+				marker = -1;  /* Second field: -1 = uncompressed marker */
+				memcpy(cData + sizeof(int), &marker, sizeof(int));
+				memcpy(cData + 2 * sizeof(int), file->buffer.data, nbytesOriginal);
+				file->nbytes = nbytesOriginal + 2 * sizeof(int);
+				DataToWrite = cData;
+			} else {
+				/* LZ4 compression failed, report error */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d", 
+						 				 cSize, nbytesOriginal)));
+			}
+		} else {
+			/*
+			 * Write header in front of compressed data
+			 * LZ4 format: [compressed_size:int][compressed_data]
+			 * PGLZ format: [compressed_size:int][original_size:int][compressed_data]
+			 */
+			memcpy(cData, &cSize, sizeof(int));
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+				memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
+				file->nbytes = cSize + 2 * sizeof(int);
+			} else {
+				file->nbytes = cSize + sizeof(int);
+			}
+			DataToWrite = cData;
+		}
+	}
 
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
@@ -535,7 +804,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +833,15 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else if (nbytesOriginal - file->pos != 0)
+		/*
+		 * curOffset must be corrected also if compression is enabled, nbytes
+		 * was changed by compression but we have to use the original value of
+		 * nbytes
+		 */
+		file->curOffset -= bytestowrite;
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -602,8 +879,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 6bc6be13d2a..399cf903fff 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -3214,6 +3214,13 @@
   options => 'default_toast_compression_options',
 },
 
+{ name => 'temp_file_compression', type => 'enum', context => 'PGC_USERSET', group => 'CLIENT_CONN_STATEMENT',
+  short_desc => 'Sets the default compression method for temporary files.',
+  variable => 'temp_file_compression',
+  boot_val => 'TEMP_NONE_COMPRESSION',
+  options => 'temp_file_compression_options',
+},
+
 { name => 'default_transaction_isolation', type => 'enum', context => 'PGC_USERSET', group => 'CLIENT_CONN_STATEMENT',
   short_desc => 'Sets the transaction isolation level of each new transaction.',
   variable => 'DefaultXactIsoLevel',
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 00c8376cf4d..2fb3891b730 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -78,6 +78,7 @@
 #include "replication/syncrep.h"
 #include "storage/aio.h"
 #include "storage/bufmgr.h"
+#include "storage/buffile.h"
 #include "storage/bufpage.h"
 #include "storage/copydir.h"
 #include "storage/io_worker.h"
@@ -464,6 +465,18 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * pglz and zstd support should be added as future enhancement
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+	{"pglz", TEMP_PGLZ_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
+
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
 #ifdef USE_LZ4
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c36fcb9ab61..f380983d2f2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -182,6 +182,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'		# enables temporary files compression
 
 # - Kernel Resources -
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index e529ceb8260..d862e22ef18 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index c9aecab8d66..ef85924cd21 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index a2f4821f240..57908dd5462 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,21 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+	TEMP_PGLZ_COMPRESSION,
+	TEMP_LZ4_COMPRESSION
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
+extern BufFile *BufFileCreateCompressTemp(bool interXact);
 extern void BufFileClose(BufFile *file);
 pg_nodiscard extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
-- 
2.51.0

v20250930-0002-whitespace.patchtext/x-patch; charset=UTF-8; name=v20250930-0002-whitespace.patchDownload

From 653b3be6804376e6f3e7682f856333a7174ca9fc Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 15:00:37 +0200
Subject: [PATCH v20250930 02/22] whitespace

---
 src/backend/storage/file/buffile.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 3cb3b4fcbb7..9deb1e8a2be 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -282,13 +282,13 @@ BufFileCreateCompressTemp(bool interXact)
 		 * - Compression type changed
 		 * - Current buffer is too small
 		 */
-		if (buff == NULL || 
+		if (buff == NULL ||
 			allocated_for_compression != temp_file_compression ||
 			allocated_size < size)
 		{
 			if (buff != NULL)
 				pfree(buff);
-			
+
 			/*
 			 * Persistent buffer for all temporary file compressions
 			 */
@@ -573,7 +573,7 @@ BufFileLoadBuffer(BufFile *file)
 							sizeof(nbytes),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
-		
+
 		/* Check if first read succeeded */
 		if (nread != sizeof(nbytes) && nread > 0)
 		{
@@ -581,7 +581,7 @@ BufFileLoadBuffer(BufFile *file)
 					(errcode(ERRCODE_DATA_CORRUPTED),
 					 errmsg_internal("first read is broken")));
 		}
-		
+
 		/* if not EOF let's continue */
 		if (nread > 0)
 		{
@@ -604,8 +604,8 @@ BufFileLoadBuffer(BufFile *file)
 				if (nread_orig != sizeof(original_size) && nread_orig > 0) {
 					ereport(ERROR,
 							(errcode(ERRCODE_DATA_CORRUPTED),
-							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes", 
-							 				 (int)sizeof(original_size), nread_orig)));
+							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes",
+											 (int)sizeof(original_size), nread_orig)));
 				}
 
 				if (nread_orig <= 0) {
@@ -616,7 +616,7 @@ BufFileLoadBuffer(BufFile *file)
 				/* Check if data is uncompressed (marker = -1) */
 				if (original_size == -1) {
 
-                    int nread_data = 0;
+					int nread_data = 0;
 					/* Uncompressed data: read directly into buffer */
 					file->curOffset += 2 * sizeof(int);  /* Skip both header fields */
 					nread_data = FileRead(thisfile,
@@ -734,7 +734,7 @@ BufFileDumpBuffer(BufFile *file)
 		if (cSize <= 0) {
 			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
 
-                int marker;
+				int marker;
 				/* PGLZ compression failed, store uncompressed data with -1 marker */
 				memcpy(cData, &nbytesOriginal, sizeof(int));  /* First field: original size */
 				marker = -1;  /* Second field: -1 = uncompressed marker */
@@ -746,8 +746,8 @@ BufFileDumpBuffer(BufFile *file)
 				/* LZ4 compression failed, report error */
 				ereport(ERROR,
 						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d", 
-						 				 cSize, nbytesOriginal)));
+						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d",
+										 cSize, nbytesOriginal)));
 			}
 		} else {
 			/*
-- 
2.51.0

v20250930-0003-pgindent.patchtext/x-patch; charset=UTF-8; name=v20250930-0003-pgindent.patchDownload

From aeb258349db469e960ba85712f240e036fe9c864 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 15:03:14 +0200
Subject: [PATCH v20250930 03/22] pgindent

---
 src/backend/storage/file/buffile.c | 167 ++++++++++++++++-------------
 src/tools/pgindent/typedefs.list   |   1 +
 2 files changed, 96 insertions(+), 72 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 9deb1e8a2be..127c1c2f427 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -73,7 +73,7 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
-int temp_file_compression = TEMP_NONE_COMPRESSION;
+int			temp_file_compression = TEMP_NONE_COMPRESSION;
 
 /*
  * This data structure represents a buffered file that consists of one or
@@ -115,9 +115,9 @@ struct BufFile
 	 */
 	PGAlignedBlock buffer;
 
-	bool		compress_tempfile; /* transparent compression mode */
-	bool		compress; /* State of usage file compression */
-	char		*cBuffer; /* compression buffer */
+	bool		compress_tempfile;	/* transparent compression mode */
+	bool		compress;		/* State of usage file compression */
+	char	   *cBuffer;		/* compression buffer */
 };
 
 static BufFile *makeBufFileCommon(int nfiles);
@@ -256,8 +256,8 @@ BufFile *
 BufFileCreateCompressTemp(bool interXact)
 {
 	static char *buff = NULL;
-	static int allocated_for_compression = TEMP_NONE_COMPRESSION;
-	static int allocated_size = 0;
+	static int	allocated_for_compression = TEMP_NONE_COMPRESSION;
+	static int	allocated_size = 0;
 	BufFile    *tmpBufFile = BufFileCreateTemp(interXact, true);
 
 	if (temp_file_compression != TEMP_NONE_COMPRESSION)
@@ -277,10 +277,8 @@ BufFileCreateCompressTemp(bool interXact)
 		}
 
 		/*
-		 * Allocate or reallocate buffer if needed:
-		 * - Buffer is NULL (first time)
-		 * - Compression type changed
-		 * - Current buffer is too small
+		 * Allocate or reallocate buffer if needed: - Buffer is NULL (first
+		 * time) - Compression type changed - Current buffer is too small
 		 */
 		if (buff == NULL ||
 			allocated_for_compression != temp_file_compression ||
@@ -544,8 +542,8 @@ BufFileLoadBuffer(BufFile *file)
 	{
 
 		/*
-		* Read whatever we can get, up to a full bufferload.
-		*/
+		 * Read whatever we can get, up to a full bufferload.
+		 */
 		file->nbytes = FileRead(thisfile,
 								file->buffer.data,
 								sizeof(file->buffer),
@@ -559,20 +557,23 @@ BufFileLoadBuffer(BufFile *file)
 					 errmsg("could not read file \"%s\": %m",
 							FilePathName(thisfile))));
 		}
-	/*
-	 * Read and decompress data from the temporary file
-	 * The first reading loads size of the compressed block
-	 * Second reading loads compressed data
-	 */
-	} else {
-		int nread;
-		int nbytes;
+
+		/*
+		 * Read and decompress data from the temporary file The first reading
+		 * loads size of the compressed block Second reading loads compressed
+		 * data
+		 */
+	}
+	else
+	{
+		int			nread;
+		int			nbytes;
 
 		nread = FileRead(thisfile,
-							&nbytes,
-							sizeof(nbytes),
-							file->curOffset,
-							WAIT_EVENT_BUFFILE_READ);
+						 &nbytes,
+						 sizeof(nbytes),
+						 file->curOffset,
+						 WAIT_EVENT_BUFFILE_READ);
 
 		/* Check if first read succeeded */
 		if (nread != sizeof(nbytes) && nread > 0)
@@ -586,44 +587,51 @@ BufFileLoadBuffer(BufFile *file)
 		if (nread > 0)
 		{
 			/* A long life buffer limits number of memory allocations */
-			char * buff = file->cBuffer;
-			int original_size = 0;
-			int header_advance = sizeof(nbytes);
+			char	   *buff = file->cBuffer;
+			int			original_size = 0;
+			int			header_advance = sizeof(nbytes);
 
 			Assert(file->cBuffer != NULL);
 
 			/* For PGLZ, read additional original size */
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
-				int nread_orig = FileRead(thisfile,
-							&original_size,
-							sizeof(original_size),
-							file->curOffset + sizeof(nbytes),
-							WAIT_EVENT_BUFFILE_READ);
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
+			{
+				int			nread_orig = FileRead(thisfile,
+												  &original_size,
+												  sizeof(original_size),
+												  file->curOffset + sizeof(nbytes),
+												  WAIT_EVENT_BUFFILE_READ);
 
 				/* Check if second read succeeded */
-				if (nread_orig != sizeof(original_size) && nread_orig > 0) {
+				if (nread_orig != sizeof(original_size) && nread_orig > 0)
+				{
 					ereport(ERROR,
 							(errcode(ERRCODE_DATA_CORRUPTED),
 							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes",
-											 (int)sizeof(original_size), nread_orig)));
+											 (int) sizeof(original_size), nread_orig)));
 				}
 
-				if (nread_orig <= 0) {
+				if (nread_orig <= 0)
+				{
 					file->nbytes = 0;
 					return;
 				}
 
 				/* Check if data is uncompressed (marker = -1) */
-				if (original_size == -1) {
+				if (original_size == -1)
+				{
+
+					int			nread_data = 0;
 
-					int nread_data = 0;
 					/* Uncompressed data: read directly into buffer */
-					file->curOffset += 2 * sizeof(int);  /* Skip both header fields */
+					file->curOffset += 2 * sizeof(int); /* Skip both header
+														 * fields */
 					nread_data = FileRead(thisfile,
-											file->buffer.data,
-											nbytes,  /* nbytes contains original size */
-											file->curOffset,
-											WAIT_EVENT_BUFFILE_READ);
+										  file->buffer.data,
+										  nbytes,	/* nbytes contains
+													 * original size */
+										  file->curOffset,
+										  WAIT_EVENT_BUFFILE_READ);
 					file->nbytes = nread_data;
 					file->curOffset += nread_data;
 					return;
@@ -633,31 +641,31 @@ BufFileLoadBuffer(BufFile *file)
 			}
 
 			/*
-			 * Read compressed data, curOffset differs with pos
-			 * It reads less data than it returns to caller
-			 * So the curOffset must be advanced here based on compressed size
+			 * Read compressed data, curOffset differs with pos It reads less
+			 * data than it returns to caller So the curOffset must be
+			 * advanced here based on compressed size
 			 */
 			file->curOffset += header_advance;
 
 			nread = FileRead(thisfile,
-							buff,
-							nbytes,
-							file->curOffset,
-							WAIT_EVENT_BUFFILE_READ);
+							 buff,
+							 nbytes,
+							 file->curOffset,
+							 WAIT_EVENT_BUFFILE_READ);
 
 			switch (temp_file_compression)
 			{
 				case TEMP_LZ4_COMPRESSION:
 #ifdef USE_LZ4
 					file->nbytes = LZ4_decompress_safe(buff,
-						file->buffer.data,nbytes,sizeof(file->buffer));
+													   file->buffer.data, nbytes, sizeof(file->buffer));
 #endif
 					break;
 
-							case TEMP_PGLZ_COMPRESSION:
-				file->nbytes = pglz_decompress(buff,nbytes,
-					file->buffer.data,original_size,false);
-				break;
+				case TEMP_PGLZ_COMPRESSION:
+					file->nbytes = pglz_decompress(buff, nbytes,
+												   file->buffer.data, original_size, false);
+					break;
 			}
 			file->curOffset += nread;
 
@@ -731,35 +739,49 @@ BufFileDumpBuffer(BufFile *file)
 		}
 
 		/* Check if compression was successful */
-		if (cSize <= 0) {
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+		if (cSize <= 0)
+		{
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
+			{
 
-				int marker;
-				/* PGLZ compression failed, store uncompressed data with -1 marker */
-				memcpy(cData, &nbytesOriginal, sizeof(int));  /* First field: original size */
-				marker = -1;  /* Second field: -1 = uncompressed marker */
+				int			marker;
+
+				/*
+				 * PGLZ compression failed, store uncompressed data with -1
+				 * marker
+				 */
+				memcpy(cData, &nbytesOriginal, sizeof(int));	/* First field: original
+																 * size */
+				marker = -1;	/* Second field: -1 = uncompressed marker */
 				memcpy(cData + sizeof(int), &marker, sizeof(int));
 				memcpy(cData + 2 * sizeof(int), file->buffer.data, nbytesOriginal);
 				file->nbytes = nbytesOriginal + 2 * sizeof(int);
 				DataToWrite = cData;
-			} else {
+			}
+			else
+			{
 				/* LZ4 compression failed, report error */
 				ereport(ERROR,
 						(errcode(ERRCODE_DATA_CORRUPTED),
 						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d",
 										 cSize, nbytesOriginal)));
 			}
-		} else {
+		}
+		else
+		{
 			/*
-			 * Write header in front of compressed data
-			 * LZ4 format: [compressed_size:int][compressed_data]
-			 * PGLZ format: [compressed_size:int][original_size:int][compressed_data]
+			 * Write header in front of compressed data LZ4 format:
+			 * [compressed_size:int][compressed_data] PGLZ format:
+			 * [compressed_size:int][original_size:int][compressed_data]
 			 */
 			memcpy(cData, &cSize, sizeof(int));
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
+			{
 				memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
 				file->nbytes = cSize + 2 * sizeof(int);
-			} else {
+			}
+			else
+			{
 				file->nbytes = cSize + sizeof(int);
 			}
 			DataToWrite = cData;
@@ -836,6 +858,7 @@ BufFileDumpBuffer(BufFile *file)
 	if (!file->compress)
 		file->curOffset -= (file->nbytes - file->pos);
 	else if (nbytesOriginal - file->pos != 0)
+
 		/*
 		 * curOffset must be corrected also if compression is enabled, nbytes
 		 * was changed by compression but we have to use the original value of
@@ -879,11 +902,11 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer.
+			/*
+			 * Try to load more data into buffer.
 			 *
-			 * curOffset is moved within BufFileLoadBuffer
-			 * because stored data size differs from loaded/
-			 * decompressed size
+			 * curOffset is moved within BufFileLoadBuffer because stored data
+			 * size differs from loaded/ decompressed size
 			 */
 			if (!file->compress)
 				file->curOffset += file->pos;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37f26f6c6b7..0a9fcaf949c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3019,6 +3019,7 @@ Tcl_NotifierProcs
 Tcl_Obj
 Tcl_Size
 Tcl_Time
+TempCompression
 TempNamespaceStatus
 TestDSMRegistryHashEntry
 TestDSMRegistryStruct
-- 
2.51.0

v20250930-0004-Add-regression-tests-for-temporary-file-co.patchtext/x-patch; charset=UTF-8; name=v20250930-0004-Add-regression-tests-for-temporary-file-co.patchDownload

From 872f7b99210dbd8375790126cec3ba3c71cf30bc Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 31 Jul 2025 14:02:45 +0200
Subject: [PATCH v20250930 04/22] Add regression tests for temporary file
 compression

This commit adds comprehensive regression tests for the transparent
temporary file compression feature.

Test coverage:
- join_hash_lz4.sql: Tests hash join operations with LZ4 compression
- join_hash_pglz.sql: Tests hash join operations with PGLZ compression
- Both tests verify compression works correctly for various hash join scenarios
- Expected output files for validation

Test integration:
- LZ4 tests are conditionally enabled when PostgreSQL is built with --with-lz4
- PGLZ tests are always enabled as PGLZ is built-in
- Tests added to parallel regression test schedule
- GNUmakefile updated to include conditional LZ4 test execution

The tests ensure that compression/decompression works transparently
without affecting query results, while providing coverage for both
supported compression algorithms.
---
 src/test/regress/GNUmakefile                 |    4 +
 src/test/regress/expected/join_hash_lz4.out  | 1166 ++++++++++++++++++
 src/test/regress/expected/join_hash_pglz.out | 1166 ++++++++++++++++++
 src/test/regress/parallel_schedule           |    4 +-
 src/test/regress/sql/join_hash_lz4.sql       |  626 ++++++++++
 src/test/regress/sql/join_hash_pglz.sql      |  626 ++++++++++
 6 files changed, 3591 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/join_hash_lz4.out
 create mode 100644 src/test/regress/expected/join_hash_pglz.out
 create mode 100644 src/test/regress/sql/join_hash_lz4.sql
 create mode 100644 src/test/regress/sql/join_hash_pglz.sql

diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index ef2bddf42ca..94df5649e34 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -94,6 +94,10 @@ installdirs-tests: installdirs
 REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 \
 	$(EXTRA_REGRESS_OPTS)
 
+ifeq ($(with_lz4),yes)
+override EXTRA_TESTS := $(EXTRA_TESTS) join_hash_lz4
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/expected/join_hash_lz4.out b/src/test/regress/expected/join_hash_lz4.out
new file mode 100644
index 00000000000..966a5cd8f55
--- /dev/null
+++ b/src/test/regress/expected/join_hash_lz4.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/expected/join_hash_pglz.out b/src/test/regress/expected/join_hash_pglz.out
new file mode 100644
index 00000000000..99c67f982af
--- /dev/null
+++ b/src/test/regress/expected/join_hash_pglz.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..d62d44814ef 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -15,7 +15,6 @@ test: test_setup
 # The first group of parallel tests
 # ----------
 test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
-
 # ----------
 # The second group of parallel tests
 # multirangetypes depends on rangetypes
@@ -140,3 +139,6 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# this test is equivalent to join_hash test just the compression is enabled
+test: join_hash_pglz
diff --git a/src/test/regress/sql/join_hash_lz4.sql b/src/test/regress/sql/join_hash_lz4.sql
new file mode 100644
index 00000000000..1d19c1980e1
--- /dev/null
+++ b/src/test/regress/sql/join_hash_lz4.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
diff --git a/src/test/regress/sql/join_hash_pglz.sql b/src/test/regress/sql/join_hash_pglz.sql
new file mode 100644
index 00000000000..2686afab272
--- /dev/null
+++ b/src/test/regress/sql/join_hash_pglz.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.51.0

v20250930-0005-remove-unused-BufFile-compress_tempfile.patchtext/x-patch; charset=UTF-8; name=v20250930-0005-remove-unused-BufFile-compress_tempfile.patchDownload

From b852411637bb6a6a5ef58f6e4bd67eb78c0e89a7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 19:21:57 +0200
Subject: [PATCH v20250930 05/22] remove unused BufFile->compress_tempfile

---
 src/backend/storage/file/buffile.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 127c1c2f427..9f466e92fa2 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -115,7 +115,6 @@ struct BufFile
 	 */
 	PGAlignedBlock buffer;
 
-	bool		compress_tempfile;	/* transparent compression mode */
 	bool		compress;		/* State of usage file compression */
 	char	   *cBuffer;		/* compression buffer */
 };
@@ -144,7 +143,6 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
-	file->compress_tempfile = false;
 	file->compress = false;
 	file->cBuffer = NULL;
 
-- 
2.51.0

v20250930-0006-simplify-BufFileCreateTemp-interface.patchtext/x-patch; charset=UTF-8; name=v20250930-0006-simplify-BufFileCreateTemp-interface.patchDownload

From 2bfead57cd8c67e4269da27f4f5e77a9bd52e053 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 21:10:37 +0200
Subject: [PATCH v20250930 06/22] simplify BufFileCreateTemp interface

Having BufFileCreateTemp(interXact,compress) seems unnecessary, when
everyone calls it with compress=false, with the exception of
BufFileCreateCompressTemp.

In fact, it seems fragile, because what if someone happens to call it
with true, without initializing the buffer?
---
 src/backend/access/gist/gistbuildbuffers.c |  2 +-
 src/backend/backup/backup_manifest.c       |  2 +-
 src/backend/storage/file/buffile.c         | 15 +++++++--------
 src/backend/utils/sort/logtape.c           |  2 +-
 src/backend/utils/sort/tuplestore.c        |  2 +-
 src/include/storage/buffile.h              |  2 +-
 6 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 9cc371f47fe..0707254d18e 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false, false);
+	gfbb->pfile = BufFileCreateTemp(false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index 35d088db0f3..d05252f383c 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false, false);
+		manifest->buffile = BufFileCreateTemp(false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 9f466e92fa2..88a1a30e418 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -215,7 +215,7 @@ extendBufFile(BufFile *file)
  * buffile will corrupt temporary data offsets.
  */
 BufFile *
-BufFileCreateTemp(bool interXact, bool compress)
+BufFileCreateTemp(bool interXact)
 {
 	BufFile    *file;
 	File		pfile;
@@ -237,11 +237,6 @@ BufFileCreateTemp(bool interXact, bool compress)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
-	if (temp_file_compression != TEMP_NONE_COMPRESSION)
-	{
-		file->compress = compress;
-	}
-
 	return file;
 }
 
@@ -256,12 +251,14 @@ BufFileCreateCompressTemp(bool interXact)
 	static char *buff = NULL;
 	static int	allocated_for_compression = TEMP_NONE_COMPRESSION;
 	static int	allocated_size = 0;
-	BufFile    *tmpBufFile = BufFileCreateTemp(interXact, true);
+	BufFile    *tmpBufFile = BufFileCreateTemp(interXact);
 
 	if (temp_file_compression != TEMP_NONE_COMPRESSION)
 	{
 		int			size = 0;
 
+		tmpBufFile->compress = true;
+
 		switch (temp_file_compression)
 		{
 			case TEMP_LZ4_COMPRESSION:
@@ -292,8 +289,10 @@ BufFileCreateCompressTemp(bool interXact)
 			allocated_for_compression = temp_file_compression;
 			allocated_size = size;
 		}
+
+		tmpBufFile->cBuffer = buff;
 	}
-	tmpBufFile->cBuffer = buff;
+
 	return tmpBufFile;
 }
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index d862e22ef18..e529ceb8260 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false, false);
+		lts->pfile = BufFileCreateTemp(false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index ef85924cd21..c9aecab8d66 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact, false);
+			state->myfile = BufFileCreateTemp(state->interXact);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 57908dd5462..49594f1948e 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -45,7 +45,7 @@ extern PGDLLIMPORT int temp_file_compression;
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
+extern BufFile *BufFileCreateTemp(bool interXact);
 extern BufFile *BufFileCreateCompressTemp(bool interXact);
 extern void BufFileClose(BufFile *file);
 pg_nodiscard extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
-- 
2.51.0

v20250930-0009-minor-BufFileLoadBuffer-cleanup.patchtext/x-patch; charset=UTF-8; name=v20250930-0009-minor-BufFileLoadBuffer-cleanup.patchDownload

From 499867857e7ef8738245ac8403fd1c4ca935148b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 22:49:57 +0200
Subject: [PATCH v20250930 09/22] minor BufFileLoadBuffer cleanup

---
 src/backend/storage/file/buffile.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 7cfca79ab5c..4d9e2e6b3b0 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -557,7 +557,6 @@ BufFileLoadBuffer(BufFile *file)
 
 	if (!file->compress)
 	{
-
 		/*
 		 * Read whatever we can get, up to a full bufferload.
 		 */
@@ -574,15 +573,13 @@ BufFileLoadBuffer(BufFile *file)
 					 errmsg("could not read file \"%s\": %m",
 							FilePathName(thisfile))));
 		}
-
-		/*
-		 * Read and decompress data from the temporary file The first reading
-		 * loads size of the compressed block Second reading loads compressed
-		 * data
-		 */
 	}
 	else
 	{
+		/*
+		 * Read and decompress data from a temporary file. We first read the
+		 * length of compressed data, then the compressed data itself.
+		 */
 		int			nread;
 		int			nbytes;
 
@@ -596,8 +593,9 @@ BufFileLoadBuffer(BufFile *file)
 		if (nread != sizeof(nbytes) && nread > 0)
 		{
 			ereport(ERROR,
-					(errcode(ERRCODE_DATA_CORRUPTED),
-					 errmsg_internal("first read is broken")));
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							FilePathName(thisfile))));
 		}
 
 		/* if not EOF let's continue */
@@ -623,9 +621,9 @@ BufFileLoadBuffer(BufFile *file)
 				if (nread_orig != sizeof(original_size) && nread_orig > 0)
 				{
 					ereport(ERROR,
-							(errcode(ERRCODE_DATA_CORRUPTED),
-							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes",
-											 (int) sizeof(original_size), nread_orig)));
+							(errcode_for_file_access(),
+							 errmsg("could not read file \"%s\": %m",
+									FilePathName(thisfile))));
 				}
 
 				if (nread_orig <= 0)
-- 
2.51.0

v20250930-0010-BufFileLoadBuffer-simpler-FileRead-handlin.patchtext/x-patch; charset=UTF-8; name=v20250930-0010-BufFileLoadBuffer-simpler-FileRead-handlin.patchDownload

From 076badafbba8e8b9824f6a1633c2cadf9e3a7655 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 22:52:22 +0200
Subject: [PATCH v20250930 10/22] BufFileLoadBuffer - simpler FileRead handling

---
 src/backend/storage/file/buffile.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 4d9e2e6b3b0..12c2e974783 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -589,17 +589,20 @@ BufFileLoadBuffer(BufFile *file)
 						 file->curOffset,
 						 WAIT_EVENT_BUFFILE_READ);
 
-		/* Check if first read succeeded */
-		if (nread != sizeof(nbytes) && nread > 0)
+		/* did we read the length of the next buffer? */
+		if (nread == 0)
 		{
+			/* eof, nothing to do */
+		}
+		else if (nread != sizeof(nbytes))
+		{
+			/* unexpected number of bytes, also covers (nread < 0) */
 			ereport(ERROR,
 					(errcode_for_file_access(),
 					 errmsg("could not read file \"%s\": %m",
 							FilePathName(thisfile))));
 		}
-
-		/* if not EOF let's continue */
-		if (nread > 0)
+		else
 		{
 			/* A long life buffer limits number of memory allocations */
 			char	   *buff = file->cBuffer;
-- 
2.51.0

v20250930-0011-BufFileLoadBuffer-simpler-FileRead-handlin.patchtext/x-patch; charset=UTF-8; name=v20250930-0011-BufFileLoadBuffer-simpler-FileRead-handlin.patchDownload

From 3c316cd158b22c07d4ef1807068ff53b51cf402f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 22:56:48 +0200
Subject: [PATCH v20250930 11/22] BufFileLoadBuffer - simpler FileRead handling

---
 src/backend/storage/file/buffile.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 12c2e974783..932c10351bc 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -620,20 +620,19 @@ BufFileLoadBuffer(BufFile *file)
 												  file->curOffset + sizeof(nbytes),
 												  WAIT_EVENT_BUFFILE_READ);
 
-				/* Check if second read succeeded */
-				if (nread_orig != sizeof(original_size) && nread_orig > 0)
+				/*
+				 * Did we read the second (raw) length? We should not get an
+				 * EOF here, we've already read the first length.
+				 */
+				if ((nread_orig == 0) || (nread_orig != sizeof(original_size)))
 				{
+					/* also covers (nread_orig < 0) */
 					ereport(ERROR,
 							(errcode_for_file_access(),
 							 errmsg("could not read file \"%s\": %m",
 									FilePathName(thisfile))));
 				}
 
-				if (nread_orig <= 0)
-				{
-					file->nbytes = 0;
-					return;
-				}
 
 				/* Check if data is uncompressed (marker = -1) */
 				if (original_size == -1)
-- 
2.51.0

v20250930-0012-BufFileLoadBuffer-comment-update.patchtext/x-patch; charset=UTF-8; name=v20250930-0012-BufFileLoadBuffer-comment-update.patchDownload

From b68442d5e391a19c983d0988e3fa20213fc4fc45 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 22:59:10 +0200
Subject: [PATCH v20250930 12/22] BufFileLoadBuffer - comment update

---
 src/backend/storage/file/buffile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 932c10351bc..985e297ad25 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -604,7 +604,7 @@ BufFileLoadBuffer(BufFile *file)
 		}
 		else
 		{
-			/* A long life buffer limits number of memory allocations */
+			/* read length of compressed data, read and decompress data */
 			char	   *buff = file->cBuffer;
 			int			original_size = 0;
 			int			header_advance = sizeof(nbytes);
-- 
2.51.0

v20250930-0013-BufFileLoadBuffer-simplify-skipping-header.patchtext/x-patch; charset=UTF-8; name=v20250930-0013-BufFileLoadBuffer-simplify-skipping-header.patchDownload

From ea8438b0c5753cb8dd9ecdd4ea6e5657a7b0a7ce Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 23:07:33 +0200
Subject: [PATCH v20250930 13/22] BufFileLoadBuffer - simplify skipping header

---
 src/backend/storage/file/buffile.c | 33 +++++++++++++++---------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 985e297ad25..406c93b812a 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -607,17 +607,19 @@ BufFileLoadBuffer(BufFile *file)
 			/* read length of compressed data, read and decompress data */
 			char	   *buff = file->cBuffer;
 			int			original_size = 0;
-			int			header_advance = sizeof(nbytes);
 
 			Assert(file->cBuffer != NULL);
 
+			/* advance past the length field */
+			file->curOffset += sizeof(nbytes);
+
 			/* For PGLZ, read additional original size */
 			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
 			{
 				int			nread_orig = FileRead(thisfile,
 												  &original_size,
 												  sizeof(original_size),
-												  file->curOffset + sizeof(nbytes),
+												  file->curOffset,
 												  WAIT_EVENT_BUFFILE_READ);
 
 				/*
@@ -633,28 +635,27 @@ BufFileLoadBuffer(BufFile *file)
 									FilePathName(thisfile))));
 				}
 
+				/* advance past the second length header */
+				file->curOffset += sizeof(original_size);
 
 				/* Check if data is uncompressed (marker = -1) */
 				if (original_size == -1)
 				{
-
-					int			nread_data = 0;
-
-					/* Uncompressed data: read directly into buffer */
-					file->curOffset += 2 * sizeof(int); /* Skip both header
-														 * fields */
-					nread_data = FileRead(thisfile,
-										  file->buffer.data,
-										  nbytes,	/* nbytes contains
+					int nread_data = FileRead(thisfile,
+											  file->buffer.data,
+											  nbytes,	/* nbytes contains
 													 * original size */
-										  file->curOffset,
-										  WAIT_EVENT_BUFFILE_READ);
+											  file->curOffset,
+											  WAIT_EVENT_BUFFILE_READ);
 					file->nbytes = nread_data;
 					file->curOffset += nread_data;
+
+					/*
+					 * FIXME this is wrong, because it skips the track_io_timing
+					 * stuff at the end, etc.
+					 */
 					return;
 				}
-
-				header_advance = 2 * sizeof(int);
 			}
 
 			/*
@@ -662,8 +663,6 @@ BufFileLoadBuffer(BufFile *file)
 			 * data than it returns to caller So the curOffset must be
 			 * advanced here based on compressed size
 			 */
-			file->curOffset += header_advance;
-
 			nread = FileRead(thisfile,
 							 buff,
 							 nbytes,
-- 
2.51.0

v20250930-0014-BufFileDumpBuffer-cleanup-simplification.patchtext/x-patch; charset=UTF-8; name=v20250930-0014-BufFileDumpBuffer-cleanup-simplification.patchDownload

From f6d285987a0719ada2605cf252d5d37ec6b50441 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 23:24:56 +0200
Subject: [PATCH v20250930 14/22] BufFileDumpBuffer - cleanup / simplification

---
 src/backend/storage/file/buffile.c | 82 ++++++++++++++----------------
 1 file changed, 37 insertions(+), 45 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 406c93b812a..1b07ef0c801 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -722,7 +722,14 @@ BufFileDumpBuffer(BufFile *file)
 	int			nbytesOriginal = file->nbytes;
 
 	/*
-	 * Compression logic: compress the buffer data if compression is enabled
+	 * Compress the data if requested for this temporary file (and if enabled
+	 * by the temp_file_compression GUC).
+	 *
+	 * The compressed data is written to the one shared compression buffer.
+	 * There's only a single compression operation at any given time, so one
+	 * buffer is enough.
+	 *
+	 * Then we simply point the "DataToWrite" buffer at the compressed buffer.
 	 */
 	if (file->compress)
 	{
@@ -740,8 +747,10 @@ BufFileDumpBuffer(BufFile *file)
 					int			cBufferSize = LZ4_compressBound(file->nbytes);
 
 					/*
-					 * Using stream compression would lead to the slight
-					 * improvement in compression ratio
+					 * XXX We might use lz4 stream compression here. Depending
+					 * on the data, that might improve the compression ratio.
+					 * The length is stored at the beginning, we'll fill it in
+					 * at the end.
 					 */
 					cSize = LZ4_compress_default(file->buffer.data,
 												 cData + sizeof(int), file->nbytes, cBufferSize);
@@ -754,54 +763,36 @@ BufFileDumpBuffer(BufFile *file)
 				break;
 		}
 
-		/* Check if compression was successful */
 		if (cSize <= 0)
 		{
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
-			{
-
-				int			marker;
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg_internal("compression failed, compressed size %d, original size %d",
+									 cSize, nbytesOriginal)));
+		}
 
-				/*
-				 * PGLZ compression failed, store uncompressed data with -1
-				 * marker
-				 */
-				memcpy(cData, &nbytesOriginal, sizeof(int));	/* First field: original
-																 * size */
-				marker = -1;	/* Second field: -1 = uncompressed marker */
-				memcpy(cData + sizeof(int), &marker, sizeof(int));
-				memcpy(cData + 2 * sizeof(int), file->buffer.data, nbytesOriginal);
-				file->nbytes = nbytesOriginal + 2 * sizeof(int);
-				DataToWrite = cData;
-			}
-			else
-			{
-				/* LZ4 compression failed, report error */
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d",
-										 cSize, nbytesOriginal)));
-			}
+		/*
+		 * Write the compressed length(s) at the beginning of the buffer.
+		 * With lz4 we store just the compressed length, with pglz we store
+		 * both the compressed and raw lengths (because pglz case fails if
+		 * the compressed data would be larger, while lz4 always succeeds).
+		 *
+		 * XXX This seems like an unnecessary consistency. We could write
+		 * both lengths in both cases, to unify the cases. It won't affect
+		 * the efficiency too much, one more int seems negligible when
+		 * compressing BLCKSZ worth of data.
+		 */
+		memcpy(cData, &cSize, sizeof(int));
+		if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
+		{
+			memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
+			file->nbytes = cSize + 2 * sizeof(int);
 		}
 		else
 		{
-			/*
-			 * Write header in front of compressed data LZ4 format:
-			 * [compressed_size:int][compressed_data] PGLZ format:
-			 * [compressed_size:int][original_size:int][compressed_data]
-			 */
-			memcpy(cData, &cSize, sizeof(int));
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
-			{
-				memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
-				file->nbytes = cSize + 2 * sizeof(int);
-			}
-			else
-			{
-				file->nbytes = cSize + sizeof(int);
-			}
-			DataToWrite = cData;
+			file->nbytes = cSize + sizeof(int);
 		}
+		DataToWrite = cData;
 	}
 
 	/*
@@ -874,13 +865,14 @@ BufFileDumpBuffer(BufFile *file)
 	if (!file->compress)
 		file->curOffset -= (file->nbytes - file->pos);
 	else if (nbytesOriginal - file->pos != 0)
-
+	{
 		/*
 		 * curOffset must be corrected also if compression is enabled, nbytes
 		 * was changed by compression but we have to use the original value of
 		 * nbytes
 		 */
 		file->curOffset -= bytestowrite;
+	}
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
-- 
2.51.0

v20250930-0015-BufFileLoadBuffer-comment.patchtext/x-patch; charset=UTF-8; name=v20250930-0015-BufFileLoadBuffer-comment.patchDownload

From c850e93ef3f145eb7018e2ff88060ec0e6bd7c47 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 23:46:04 +0200
Subject: [PATCH v20250930 15/22] BufFileLoadBuffer - comment

---
 src/backend/storage/file/buffile.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 1b07ef0c801..c8d2999a64c 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -662,6 +662,9 @@ BufFileLoadBuffer(BufFile *file)
 			 * Read compressed data, curOffset differs with pos It reads less
 			 * data than it returns to caller So the curOffset must be
 			 * advanced here based on compressed size
+			 *
+			 * XXX I don't understand what this comment is meant to say. Maybe
+			 * it got broken by the earlier cleanup, not sure.
 			 */
 			nread = FileRead(thisfile,
 							 buff,
-- 
2.51.0

v20250930-0016-BufFileLoadBuffer-missing-FileRead-error-h.patchtext/x-patch; charset=UTF-8; name=v20250930-0016-BufFileLoadBuffer-missing-FileRead-error-h.patchDownload

From 81f6088f3fd62dbc011106d1a4600ec637298731 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 23:48:53 +0200
Subject: [PATCH v20250930 16/22] BufFileLoadBuffer - missing FileRead error
 handling

---
 src/backend/storage/file/buffile.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index c8d2999a64c..125357fc07f 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -641,6 +641,7 @@ BufFileLoadBuffer(BufFile *file)
 				/* Check if data is uncompressed (marker = -1) */
 				if (original_size == -1)
 				{
+					/* FIXME this is missing error handling */
 					int nread_data = FileRead(thisfile,
 											  file->buffer.data,
 											  nbytes,	/* nbytes contains
-- 
2.51.0

v20250930-0017-simplify-the-compression-header.patchtext/x-patch; charset=UTF-8; name=v20250930-0017-simplify-the-compression-header.patchDownload

From 57678724d917e57bca5cc796cbf90e3bb510e767 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 30 Sep 2025 00:29:13 +0200
Subject: [PATCH v20250930 17/22] simplify the compression header

---
 src/backend/storage/file/buffile.c | 223 ++++++++++++++++-------------
 1 file changed, 121 insertions(+), 102 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 125357fc07f..cb6763ed586 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -119,6 +119,23 @@ struct BufFile
 	char	   *cBuffer;		/* compression buffer */
 };
 
+/*
+ * Header written right before each chunk of data with compression enabled.
+ * The 'len' is the length of the data buffer written right after the header,
+ * and 'raw_len' is the length of uncompressed data. If the data ends up not
+ * being compressed (e.g. when pglz does not reach the compression ratio),
+ * the raw_len is set to -1 and the len is the raw (uncompressed) length.
+ *
+ * To make things simpler, we write these headers even for mathods that do
+ * not fail (or rather when they fail, it's a proper error). The space for
+ * an extra integer seems negligible.
+ */
+typedef struct CompressHeader
+{
+	int			len;		/* data length (compressed, excluding header) */
+	int			raw_len;	/* raw length (-1: not compressed) */
+} CompressHeader;
+
 static BufFile *makeBufFileCommon(int nfiles);
 static BufFile *makeBufFile(File firstfile);
 static void extendBufFile(BufFile *file);
@@ -271,11 +288,11 @@ BufFileCreateCompressTemp(bool interXact)
 		{
 			case TEMP_LZ4_COMPRESSION:
 #ifdef USE_LZ4
-				size = LZ4_compressBound(BLCKSZ) + sizeof(int);
+				size = LZ4_compressBound(BLCKSZ) + sizeof(CompressHeader);
 #endif
 				break;
 			case TEMP_PGLZ_COMPRESSION:
-				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ) + 2 * sizeof(int);
+				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ) + sizeof(CompressHeader);
 				break;
 		}
 
@@ -578,14 +595,14 @@ BufFileLoadBuffer(BufFile *file)
 	{
 		/*
 		 * Read and decompress data from a temporary file. We first read the
-		 * length of compressed data, then the compressed data itself.
+		 * header with compressed/raw lengths, and then the compressed data.
 		 */
 		int			nread;
-		int			nbytes;
+		CompressHeader	header;
 
 		nread = FileRead(thisfile,
-						 &nbytes,
-						 sizeof(nbytes),
+						 &header,
+						 sizeof(header),
 						 file->curOffset,
 						 WAIT_EVENT_BUFFILE_READ);
 
@@ -594,7 +611,7 @@ BufFileLoadBuffer(BufFile *file)
 		{
 			/* eof, nothing to do */
 		}
-		else if (nread != sizeof(nbytes))
+		else if (nread != sizeof(header))
 		{
 			/* unexpected number of bytes, also covers (nread < 0) */
 			ereport(ERROR,
@@ -604,97 +621,87 @@ BufFileLoadBuffer(BufFile *file)
 		}
 		else
 		{
-			/* read length of compressed data, read and decompress data */
+			/* read length of compressed data, read (and decompress) data */
 			char	   *buff = file->cBuffer;
-			int			original_size = 0;
 
 			Assert(file->cBuffer != NULL);
 
 			/* advance past the length field */
-			file->curOffset += sizeof(nbytes);
+			file->curOffset += sizeof(header);
 
-			/* For PGLZ, read additional original size */
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
+			/*
+			 * raw_len==-1 means the data was not compressed after all, which
+			 * can happen e.g. for non-compressible data with pglz. In that
+			 * case just copy the data in place. Otherwise do the decompression.
+			 *
+			 * XXX Maybe we should just do the FileRead first, and then either
+			 * decompress or memcpy() for raw_len=-1. That'd be an extra memcpy,
+			 * but it'd make the code simpler (this ways we do the error checks
+			 * twice, for each branch).
+			 */
+			if (header.raw_len == -1)
 			{
-				int			nread_orig = FileRead(thisfile,
-												  &original_size,
-												  sizeof(original_size),
-												  file->curOffset,
-												  WAIT_EVENT_BUFFILE_READ);
+				nread = FileRead(thisfile,
+								 file->buffer.data,
+								 header.len,
+								 file->curOffset,
+								 WAIT_EVENT_BUFFILE_READ);
+				if (nread != header.len)
+				{
+					ereport(ERROR,
+							(errcode_for_file_access(),
+							 errmsg("could not read file \"%s\": %m",
+									FilePathName(thisfile))));
+				}
 
+				file->nbytes = nread;
+				file->curOffset += nread;
+			}
+			else
+			{
 				/*
-				 * Did we read the second (raw) length? We should not get an
-				 * EOF here, we've already read the first length.
+				 * Read compressed data into the separate buffer, and then
+				 * decompress into the target file buffer.
 				 */
-				if ((nread_orig == 0) || (nread_orig != sizeof(original_size)))
+				nread = FileRead(thisfile,
+								 buff,
+								 header.len,
+								 file->curOffset,
+								 WAIT_EVENT_BUFFILE_READ);
+				if (nread != header.len)
 				{
-					/* also covers (nread_orig < 0) */
 					ereport(ERROR,
 							(errcode_for_file_access(),
 							 errmsg("could not read file \"%s\": %m",
 									FilePathName(thisfile))));
 				}
 
-				/* advance past the second length header */
-				file->curOffset += sizeof(original_size);
-
-				/* Check if data is uncompressed (marker = -1) */
-				if (original_size == -1)
+				switch (temp_file_compression)
 				{
-					/* FIXME this is missing error handling */
-					int nread_data = FileRead(thisfile,
-											  file->buffer.data,
-											  nbytes,	/* nbytes contains
-													 * original size */
-											  file->curOffset,
-											  WAIT_EVENT_BUFFILE_READ);
-					file->nbytes = nread_data;
-					file->curOffset += nread_data;
+					case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+						file->nbytes = LZ4_decompress_safe(buff,
+														   file->buffer.data, header.len,
+														   sizeof(file->buffer));
+#endif
+						break;
 
-					/*
-					 * FIXME this is wrong, because it skips the track_io_timing
-					 * stuff at the end, etc.
-					 */
-					return;
+					case TEMP_PGLZ_COMPRESSION:
+						file->nbytes = pglz_decompress(buff, header.len,
+													   file->buffer.data, header.raw_len, false);
+						break;
 				}
-			}
-
-			/*
-			 * Read compressed data, curOffset differs with pos It reads less
-			 * data than it returns to caller So the curOffset must be
-			 * advanced here based on compressed size
-			 *
-			 * XXX I don't understand what this comment is meant to say. Maybe
-			 * it got broken by the earlier cleanup, not sure.
-			 */
-			nread = FileRead(thisfile,
-							 buff,
-							 nbytes,
-							 file->curOffset,
-							 WAIT_EVENT_BUFFILE_READ);
+				file->curOffset += nread;
 
-			switch (temp_file_compression)
-			{
-				case TEMP_LZ4_COMPRESSION:
-#ifdef USE_LZ4
-					file->nbytes = LZ4_decompress_safe(buff,
-													   file->buffer.data, nbytes, sizeof(file->buffer));
-#endif
-					break;
+				if (file->nbytes < 0)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("compressed data is corrupt")));
 
-				case TEMP_PGLZ_COMPRESSION:
-					file->nbytes = pglz_decompress(buff, nbytes,
-												   file->buffer.data, original_size, false);
-					break;
+				/* should have got the expected length */
+				Assert(file->nbytes == header.raw_len);
 			}
-			file->curOffset += nread;
-
-			if (file->nbytes < 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("compressed lz4 data is corrupt")));
 		}
-
 	}
 
 	if (track_io_timing)
@@ -734,15 +741,23 @@ BufFileDumpBuffer(BufFile *file)
 	 * buffer is enough.
 	 *
 	 * Then we simply point the "DataToWrite" buffer at the compressed buffer.
+	 *
+	 * XXX I'm not 100% happy with all the variables here, there seems to be
+	 * more than necessary.
 	 */
 	if (file->compress)
 	{
 		char	   *cData;
 		int			cSize = 0;
+		CompressHeader	header;
 
 		Assert(file->cBuffer != NULL);
 		cData = file->cBuffer;
 
+		/* initialize the header for compression */
+		header.len = -1;
+		header.raw_len = nbytesOriginal;
+
 		switch (temp_file_compression)
 		{
 			case TEMP_LZ4_COMPRESSION:
@@ -757,45 +772,49 @@ BufFileDumpBuffer(BufFile *file)
 					 * at the end.
 					 */
 					cSize = LZ4_compress_default(file->buffer.data,
-												 cData + sizeof(int), file->nbytes, cBufferSize);
+												 cData + sizeof(CompressHeader),
+												 file->nbytes, cBufferSize);
+					if (cSize < 0)
+					{
+						ereport(ERROR,
+								(errcode(ERRCODE_DATA_CORRUPTED),
+								 errmsg_internal("compression failed, compressed size %d, original size %d",
+												 cSize, nbytesOriginal)));
+					}
 #endif
 					break;
 				}
 			case TEMP_PGLZ_COMPRESSION:
 				cSize = pglz_compress(file->buffer.data, file->nbytes,
-									  cData + 2 * sizeof(int), PGLZ_strategy_always);
+									  cData + sizeof(CompressHeader),
+									  PGLZ_strategy_always);
+
+				/*
+				 * pglz returns -1 for non-compressible data. In that case
+				 * just copy the raw data into the output buffer.
+				 */
+				if (cSize == -1)
+				{
+					memcpy(cData + sizeof(CompressHeader), file->buffer.data,
+						   header.raw_len);
+
+					cSize = header.raw_len;
+					header.raw_len = -1;
+				}
 				break;
 		}
 
-		if (cSize <= 0)
-		{
-			ereport(ERROR,
-					(errcode(ERRCODE_DATA_CORRUPTED),
-					 errmsg_internal("compression failed, compressed size %d, original size %d",
-									 cSize, nbytesOriginal)));
-		}
+		Assert(cSize != -1);
+		header.len = cSize;
 
 		/*
-		 * Write the compressed length(s) at the beginning of the buffer.
-		 * With lz4 we store just the compressed length, with pglz we store
-		 * both the compressed and raw lengths (because pglz case fails if
-		 * the compressed data would be larger, while lz4 always succeeds).
-		 *
-		 * XXX This seems like an unnecessary consistency. We could write
-		 * both lengths in both cases, to unify the cases. It won't affect
-		 * the efficiency too much, one more int seems negligible when
-		 * compressing BLCKSZ worth of data.
+		 * Write the header with compressed length at the beginning of the
+		 * buffer. We store both the compressed and raw lengths, and use
+		 * raw_len=-1 when the data was not compressed after all.
 		 */
-		memcpy(cData, &cSize, sizeof(int));
-		if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
-		{
-			memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
-			file->nbytes = cSize + 2 * sizeof(int);
-		}
-		else
-		{
-			file->nbytes = cSize + sizeof(int);
-		}
+		memcpy(cData, &header, sizeof(CompressHeader));
+		file->nbytes = header.len + sizeof(CompressHeader);
+
 		DataToWrite = cData;
 	}
 
-- 
2.51.0

v20250930-0018-undo-unncessary-changes-to-Makefile.patchtext/x-patch; charset=UTF-8; name=v20250930-0018-undo-unncessary-changes-to-Makefile.patchDownload

From c481a48c07865a5710eb75dfbca9e74df847f410 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 30 Sep 2025 02:07:27 +0200
Subject: [PATCH v20250930 18/22] undo unncessary changes to Makefile

The patch does not seem to need this. But maybe this should be committed separately?
---
 src/Makefile.global.in | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 3a8b277a9ae..0aa389bc710 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -201,7 +201,6 @@ with_liburing	= @with_liburing@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
 with_llvm	= @with_llvm@
-with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
-- 
2.51.0

v20250930-0019-enable-compression-for-tuplestore.patchtext/x-patch; charset=UTF-8; name=v20250930-0019-enable-compression-for-tuplestore.patchDownload

From 47ffaabd77d0bd3c72acefccad1ef8d51c292b4f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 30 Sep 2025 02:30:34 +0200
Subject: [PATCH v20250930 19/22] enable compression for tuplestore

disabled when random access requested
---
 src/backend/utils/sort/tuplestore.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index c9aecab8d66..db3de5cd6b2 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,13 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			/*
+			 * If requested random access, can't compress the temp file.
+			 */
+			if ((state->eflags & EXEC_FLAG_BACKWARD) != 0)
+				state->myfile = BufFileCreateTemp(state->interXact);
+			else
+				state->myfile = BufFileCreateCompressTemp(state->interXact);
 
 			MemoryContextSwitchTo(oldcxt);
 
-- 
2.51.0

v20250930-0020-remember-compression-method-for-each-file.patchtext/x-patch; charset=UTF-8; name=v20250930-0020-remember-compression-method-for-each-file.patchDownload

From 3f5a40e3455de5f0af9af0072989357da0a4f552 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 30 Sep 2025 02:24:20 +0200
Subject: [PATCH v20250930 20/22] remember compression method for each file

Otherwise we might have a problem if the compression method changes
during lifetime of a file, e.g. with an open cursor. Example:

  set temp_file_compression = 'lz4';
  set work_mem = '1MB';

  begin;
  declare c no scroll cursor for
   select i from generate_series(1,10000000) s(i);
  fetch 10000 from c;
  set temp_file_compression = 'pglz';
  fetch 10000 from c;

This ends with

  ERROR:  compressed data is corrupt
---
 src/backend/storage/file/buffile.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index cb6763ed586..e759e6687b6 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -73,6 +73,9 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+/*
+ * Optional transparent compression of temporary files. Disaled by default.
+ */
 int			temp_file_compression = TEMP_NONE_COMPRESSION;
 
 /*
@@ -115,7 +118,7 @@ struct BufFile
 	 */
 	PGAlignedBlock buffer;
 
-	bool		compress;		/* State of usage file compression */
+	int			compress;		/* enabled compression for the file */
 	char	   *cBuffer;		/* compression buffer */
 };
 
@@ -160,7 +163,7 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
-	file->compress = false;
+	file->compress = TEMP_NONE_COMPRESSION;
 	file->cBuffer = NULL;
 
 	return file;
@@ -322,7 +325,7 @@ BufFileCreateCompressTemp(bool interXact)
 			allocated_size = size;
 		}
 
-		file->compress = true;
+		file->compress = temp_file_compression;
 		file->cBuffer = buff;
 	}
 
@@ -572,7 +575,7 @@ BufFileLoadBuffer(BufFile *file)
 	else
 		INSTR_TIME_SET_ZERO(io_start);
 
-	if (!file->compress)
+	if (file->compress == TEMP_NONE_COMPRESSION)
 	{
 		/*
 		 * Read whatever we can get, up to a full bufferload.
@@ -676,7 +679,7 @@ BufFileLoadBuffer(BufFile *file)
 									FilePathName(thisfile))));
 				}
 
-				switch (temp_file_compression)
+				switch (file->compress)
 				{
 					case TEMP_LZ4_COMPRESSION:
 #ifdef USE_LZ4
@@ -745,7 +748,7 @@ BufFileDumpBuffer(BufFile *file)
 	 * XXX I'm not 100% happy with all the variables here, there seems to be
 	 * more than necessary.
 	 */
-	if (file->compress)
+	if (file->compress != TEMP_NONE_COMPRESSION)
 	{
 		char	   *cData;
 		int			cSize = 0;
@@ -758,7 +761,7 @@ BufFileDumpBuffer(BufFile *file)
 		header.len = -1;
 		header.raw_len = nbytesOriginal;
 
-		switch (temp_file_compression)
+		switch (file->compress)
 		{
 			case TEMP_LZ4_COMPRESSION:
 				{
-- 
2.51.0

v20250930-0021-LZ4_compress_default-returns-0-on-error.patchtext/x-patch; charset=UTF-8; name=v20250930-0021-LZ4_compress_default-returns-0-on-error.patchDownload

From d29d16d4e7385b8be8fae85e2f7a13983e7d7919 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 30 Sep 2025 12:57:57 +0200
Subject: [PATCH v20250930 21/22] LZ4_compress_default returns 0 on error

---
 src/backend/storage/file/buffile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index e759e6687b6..881fc44faa3 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -777,7 +777,7 @@ BufFileDumpBuffer(BufFile *file)
 					cSize = LZ4_compress_default(file->buffer.data,
 												 cData + sizeof(CompressHeader),
 												 file->nbytes, cBufferSize);
-					if (cSize < 0)
+					if (cSize == 0)
 					{
 						ereport(ERROR,
 								(errcode(ERRCODE_DATA_CORRUPTED),
-- 
2.51.0

v20250930-0022-try-LZ4_compress_fast.patchtext/x-patch; charset=UTF-8; name=v20250930-0022-try-LZ4_compress_fast.patchDownload

From ee08e8b8c6d0de57ce9c478afd1ce89aebafce94 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 30 Sep 2025 13:00:10 +0200
Subject: [PATCH v20250930 22/22] try LZ4_compress_fast

---
 src/backend/storage/file/buffile.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 881fc44faa3..e3d4e2a342e 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -774,9 +774,10 @@ BufFileDumpBuffer(BufFile *file)
 					 * The length is stored at the beginning, we'll fill it in
 					 * at the end.
 					 */
-					cSize = LZ4_compress_default(file->buffer.data,
-												 cData + sizeof(CompressHeader),
-												 file->nbytes, cBufferSize);
+					cSize = LZ4_compress_fast(file->buffer.data,
+											  cData + sizeof(CompressHeader),
+											  file->nbytes, cBufferSize,
+											  10);	/* somewhat higher value */
 					if (cSize == 0)
 					{
 						ereport(ERROR,
-- 
2.51.0

#20

Tomas Vondra

tomas@vondra.me

3 months ago

In reply to: Tomas Vondra (#19)

27 attachment(s)

Re: Proposal: Adding compression of temporary files

Hi,

On 9/30/25 14:42, Tomas Vondra wrote:

v20250930-0018-undo-unncessary-changes-to-Makefile.patch

- Why did the 0001 patch add this? Maybe it's something we should add
separately, not as part of this patch?

I realized this bit is actually necessary, to make the EXTRA_TESTS work
for the lz4 regression test. The attached patch series skips this bit.

There's also experimental patches adding gzip (or rather libz) and zstd
compression. This is very rough, I just wanted to see how would these
perform compared to pglz/lz4. But I haven't done any proper evaluation
so far, beyond running a couple simple queries. Will try to spend a bit
more time on that soon.

I still wonder about the impact of stream compression. I know it can
improve the compression ratio, but I'm not sure if it also helps with
the compression speed. I think for temporary files faster compression
(and lower ratio) may be a better trade off. So maybe we should user
lower compression levels ...

Attached are two PDF files with results of the perf evaluation using
TPC-H 10GB and 50GB data sets. One table shows timings for 22 queries
with compression set to no/pglz/lz4, for a range of parameter
combinations (work_mem, parallel workers). The other shows amount of
temporary files (in MBs) generated by each query.

The timing shows that pglz is pretty slow, about doubling duration for
some of the queries. That's not surprising, we know pglz can be slow.
lz4 is almost perfectly neutral, which is actually great - the goal is
to reduce I/O pressure for temporary files, but with a single query
running at a time, that's not a problem. So "no impact" is about the
best we can do, it shows the lz4 overhead is negligible.

For "size" PDF shows that the compression can save a fair amount of temp
space. For many queries it saves 50-70% of temporary space. A good
example is Q9 which (on the 50GB scale) used to take about 33GB, and
with compression it's down to ~17GB (with both pglz and lz4). That's
pretty good, I think.

FWIW the "size" results may be a bit misleading, in that it measures
tempfile size for the whole query. But some may use multiple temporary
files, and some may not support compression (e.g. tuplesort don't).
Which will make the actual compression ratio look lower. OTOH it's a
more representative of impact on actual queries.

regards

--
Tomas Vondra

Attachments:

v20251001-0001-Add-transparent-compression-for-temporary-.patchtext/x-patch; charset=UTF-8; name=v20251001-0001-Add-transparent-compression-for-temporary-.patchDownload

From bb5ab738ba2e4e12dc0739de3da51fe244f6e483 Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 31 Jul 2025 14:02:16 +0200
Subject: [PATCH v20251001 01/25] Add transparent compression for temporary
 files

This commit implements transparent compression for temporary files in PostgreSQL,
specifically designed for hash join operations that spill to disk.

Features:
- Support for LZ4 and PGLZ compression algorithms
- GUC parameter 'temp_file_compression' to control compression
- Transparent compression/decompression in BufFile layer
- Shared compression buffer to minimize memory allocation
- Hash join integration using BufFileCreateCompressTemp()

The compression is applied automatically when temp_file_compression is enabled,
with no changes required to calling code. Only hash joins use compression
currently, with seeking limited to rewinding to start.

Configuration options:
- temp_file_compression = 'no' (default)
- temp_file_compression = 'pglz'
- temp_file_compression = 'lz4' (requires --with-lz4)

Fix GUC tables structure for compression support
---
 src/Makefile.global.in                        |   1 +
 src/backend/access/gist/gistbuildbuffers.c    |   2 +-
 src/backend/backup/backup_manifest.c          |   2 +-
 src/backend/executor/nodeHashjoin.c           |   2 +-
 src/backend/storage/file/buffile.c            | 317 +++++++++++++++++-
 src/backend/utils/misc/guc_parameters.dat     |   7 +
 src/backend/utils/misc/guc_tables.c           |  13 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/sort/logtape.c              |   2 +-
 src/backend/utils/sort/tuplestore.c           |   2 +-
 src/include/storage/buffile.h                 |  12 +-
 11 files changed, 338 insertions(+), 23 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 0aa389bc710..3a8b277a9ae 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -201,6 +201,7 @@ with_liburing	= @with_liburing@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
 with_llvm	= @with_llvm@
+with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 0707254d18e..9cc371f47fe 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false);
+	gfbb->pfile = BufFileCreateTemp(false, false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index d05252f383c..35d088db0f3 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false);
+		manifest->buffile = BufFileCreateTemp(false, false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5661ad76830..384265ca74a 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1434,7 +1434,7 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	{
 		MemoryContext oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
 
-		file = BufFileCreateTemp(false);
+		file = BufFileCreateCompressTemp(false);
 		*fileptr = file;
 
 		MemoryContextSwitchTo(oldctx);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 366d70d38a1..3cb3b4fcbb7 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -53,6 +53,17 @@
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "utils/resowner.h"
+#include "utils/memutils.h"
+#include "common/pg_lzcompress.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+/* Compression types */
+#define TEMP_NONE_COMPRESSION  0
+#define TEMP_PGLZ_COMPRESSION  1
+#define TEMP_LZ4_COMPRESSION   2
 
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
@@ -62,6 +73,8 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+int temp_file_compression = TEMP_NONE_COMPRESSION;
+
 /*
  * This data structure represents a buffered file that consists of one or
  * more physical files (each accessed through a virtual file descriptor
@@ -101,6 +114,10 @@ struct BufFile
 	 * wasting per-file alignment padding when some users create many files.
 	 */
 	PGAlignedBlock buffer;
+
+	bool		compress_tempfile; /* transparent compression mode */
+	bool		compress; /* State of usage file compression */
+	char		*cBuffer; /* compression buffer */
 };
 
 static BufFile *makeBufFileCommon(int nfiles);
@@ -127,6 +144,9 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
+	file->compress_tempfile = false;
+	file->compress = false;
+	file->cBuffer = NULL;
 
 	return file;
 }
@@ -188,9 +208,16 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
+ *
+ * If compress is true the temporary files will be compressed before
+ * writing on disk.
+ *
+ * Note: The compression does not support random access. Only the hash joins
+ * use it for now. The seek operation other than seek to the beginning of the
+ * buffile will corrupt temporary data offsets.
  */
 BufFile *
-BufFileCreateTemp(bool interXact)
+BufFileCreateTemp(bool interXact, bool compress)
 {
 	BufFile    *file;
 	File		pfile;
@@ -212,9 +239,68 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		file->compress = compress;
+	}
+
 	return file;
 }
 
+/*
+ * Wrapper for BufFileCreateTemp
+ * We want to limit the number of memory allocations for the compression buffer,
+ * only one buffer for all compression operations is enough
+ */
+BufFile *
+BufFileCreateCompressTemp(bool interXact)
+{
+	static char *buff = NULL;
+	static int allocated_for_compression = TEMP_NONE_COMPRESSION;
+	static int allocated_size = 0;
+	BufFile    *tmpBufFile = BufFileCreateTemp(interXact, true);
+
+	if (temp_file_compression != TEMP_NONE_COMPRESSION)
+	{
+		int			size = 0;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+				size = LZ4_compressBound(BLCKSZ) + sizeof(int);
+#endif
+				break;
+			case TEMP_PGLZ_COMPRESSION:
+				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ) + 2 * sizeof(int);
+				break;
+		}
+
+		/*
+		 * Allocate or reallocate buffer if needed:
+		 * - Buffer is NULL (first time)
+		 * - Compression type changed
+		 * - Current buffer is too small
+		 */
+		if (buff == NULL || 
+			allocated_for_compression != temp_file_compression ||
+			allocated_size < size)
+		{
+			if (buff != NULL)
+				pfree(buff);
+			
+			/*
+			 * Persistent buffer for all temporary file compressions
+			 */
+			buff = MemoryContextAlloc(TopMemoryContext, size);
+			allocated_for_compression = temp_file_compression;
+			allocated_size = size;
+		}
+	}
+	tmpBufFile->cBuffer = buff;
+	return tmpBufFile;
+}
+
 /*
  * Build the name for a given segment of a given BufFile.
  */
@@ -454,21 +540,133 @@ BufFileLoadBuffer(BufFile *file)
 	else
 		INSTR_TIME_SET_ZERO(io_start);
 
+	if (!file->compress)
+	{
+
+		/*
+		* Read whatever we can get, up to a full bufferload.
+		*/
+		file->nbytes = FileRead(thisfile,
+								file->buffer.data,
+								sizeof(file->buffer),
+								file->curOffset,
+								WAIT_EVENT_BUFFILE_READ);
+		if (file->nbytes < 0)
+		{
+			file->nbytes = 0;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							FilePathName(thisfile))));
+		}
 	/*
-	 * Read whatever we can get, up to a full bufferload.
+	 * Read and decompress data from the temporary file
+	 * The first reading loads size of the compressed block
+	 * Second reading loads compressed data
 	 */
-	file->nbytes = FileRead(thisfile,
-							file->buffer.data,
-							sizeof(file->buffer.data),
+	} else {
+		int nread;
+		int nbytes;
+
+		nread = FileRead(thisfile,
+							&nbytes,
+							sizeof(nbytes),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
-	if (file->nbytes < 0)
-	{
-		file->nbytes = 0;
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\": %m",
-						FilePathName(thisfile))));
+		
+		/* Check if first read succeeded */
+		if (nread != sizeof(nbytes) && nread > 0)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg_internal("first read is broken")));
+		}
+		
+		/* if not EOF let's continue */
+		if (nread > 0)
+		{
+			/* A long life buffer limits number of memory allocations */
+			char * buff = file->cBuffer;
+			int original_size = 0;
+			int header_advance = sizeof(nbytes);
+
+			Assert(file->cBuffer != NULL);
+
+			/* For PGLZ, read additional original size */
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+				int nread_orig = FileRead(thisfile,
+							&original_size,
+							sizeof(original_size),
+							file->curOffset + sizeof(nbytes),
+							WAIT_EVENT_BUFFILE_READ);
+
+				/* Check if second read succeeded */
+				if (nread_orig != sizeof(original_size) && nread_orig > 0) {
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes", 
+							 				 (int)sizeof(original_size), nread_orig)));
+				}
+
+				if (nread_orig <= 0) {
+					file->nbytes = 0;
+					return;
+				}
+
+				/* Check if data is uncompressed (marker = -1) */
+				if (original_size == -1) {
+
+                    int nread_data = 0;
+					/* Uncompressed data: read directly into buffer */
+					file->curOffset += 2 * sizeof(int);  /* Skip both header fields */
+					nread_data = FileRead(thisfile,
+											file->buffer.data,
+											nbytes,  /* nbytes contains original size */
+											file->curOffset,
+											WAIT_EVENT_BUFFILE_READ);
+					file->nbytes = nread_data;
+					file->curOffset += nread_data;
+					return;
+				}
+
+				header_advance = 2 * sizeof(int);
+			}
+
+			/*
+			 * Read compressed data, curOffset differs with pos
+			 * It reads less data than it returns to caller
+			 * So the curOffset must be advanced here based on compressed size
+			 */
+			file->curOffset += header_advance;
+
+			nread = FileRead(thisfile,
+							buff,
+							nbytes,
+							file->curOffset,
+							WAIT_EVENT_BUFFILE_READ);
+
+			switch (temp_file_compression)
+			{
+				case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+					file->nbytes = LZ4_decompress_safe(buff,
+						file->buffer.data,nbytes,sizeof(file->buffer));
+#endif
+					break;
+
+							case TEMP_PGLZ_COMPRESSION:
+				file->nbytes = pglz_decompress(buff,nbytes,
+					file->buffer.data,original_size,false);
+				break;
+			}
+			file->curOffset += nread;
+
+			if (file->nbytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed lz4 data is corrupt")));
+		}
+
 	}
 
 	if (track_io_timing)
@@ -494,8 +692,79 @@ static void
 BufFileDumpBuffer(BufFile *file)
 {
 	int			wpos = 0;
-	int			bytestowrite;
+	int			bytestowrite = 0;
 	File		thisfile;
+	char	   *DataToWrite = file->buffer.data;
+	int			nbytesOriginal = file->nbytes;
+
+	/*
+	 * Compression logic: compress the buffer data if compression is enabled
+	 */
+	if (file->compress)
+	{
+		char	   *cData;
+		int			cSize = 0;
+
+		Assert(file->cBuffer != NULL);
+		cData = file->cBuffer;
+
+		switch (temp_file_compression)
+		{
+			case TEMP_LZ4_COMPRESSION:
+				{
+#ifdef USE_LZ4
+					int			cBufferSize = LZ4_compressBound(file->nbytes);
+
+					/*
+					 * Using stream compression would lead to the slight
+					 * improvement in compression ratio
+					 */
+					cSize = LZ4_compress_default(file->buffer.data,
+												 cData + sizeof(int), file->nbytes, cBufferSize);
+#endif
+					break;
+				}
+			case TEMP_PGLZ_COMPRESSION:
+				cSize = pglz_compress(file->buffer.data, file->nbytes,
+									  cData + 2 * sizeof(int), PGLZ_strategy_always);
+				break;
+		}
+
+		/* Check if compression was successful */
+		if (cSize <= 0) {
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+
+                int marker;
+				/* PGLZ compression failed, store uncompressed data with -1 marker */
+				memcpy(cData, &nbytesOriginal, sizeof(int));  /* First field: original size */
+				marker = -1;  /* Second field: -1 = uncompressed marker */
+				memcpy(cData + sizeof(int), &marker, sizeof(int));
+				memcpy(cData + 2 * sizeof(int), file->buffer.data, nbytesOriginal);
+				file->nbytes = nbytesOriginal + 2 * sizeof(int);
+				DataToWrite = cData;
+			} else {
+				/* LZ4 compression failed, report error */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d", 
+						 				 cSize, nbytesOriginal)));
+			}
+		} else {
+			/*
+			 * Write header in front of compressed data
+			 * LZ4 format: [compressed_size:int][compressed_data]
+			 * PGLZ format: [compressed_size:int][original_size:int][compressed_data]
+			 */
+			memcpy(cData, &cSize, sizeof(int));
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+				memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
+				file->nbytes = cSize + 2 * sizeof(int);
+			} else {
+				file->nbytes = cSize + sizeof(int);
+			}
+			DataToWrite = cData;
+		}
+	}
 
 	/*
 	 * Unlike BufFileLoadBuffer, we must dump the whole buffer even if it
@@ -535,7 +804,7 @@ BufFileDumpBuffer(BufFile *file)
 			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
-								 file->buffer.data + wpos,
+								 DataToWrite + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
@@ -564,7 +833,15 @@ BufFileDumpBuffer(BufFile *file)
 	 * logical file position, ie, original value + pos, in case that is less
 	 * (as could happen due to a small backwards seek in a dirty buffer!)
 	 */
-	file->curOffset -= (file->nbytes - file->pos);
+	if (!file->compress)
+		file->curOffset -= (file->nbytes - file->pos);
+	else if (nbytesOriginal - file->pos != 0)
+		/*
+		 * curOffset must be corrected also if compression is enabled, nbytes
+		 * was changed by compression but we have to use the original value of
+		 * nbytes
+		 */
+		file->curOffset -= bytestowrite;
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
@@ -602,8 +879,14 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
+			/* Try to load more data into buffer.
+			 *
+			 * curOffset is moved within BufFileLoadBuffer
+			 * because stored data size differs from loaded/
+			 * decompressed size
+			 */
+			if (!file->compress)
+				file->curOffset += file->pos;
 			file->pos = 0;
 			file->nbytes = 0;
 			BufFileLoadBuffer(file);
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 6bc6be13d2a..399cf903fff 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -3214,6 +3214,13 @@
   options => 'default_toast_compression_options',
 },
 
+{ name => 'temp_file_compression', type => 'enum', context => 'PGC_USERSET', group => 'CLIENT_CONN_STATEMENT',
+  short_desc => 'Sets the default compression method for temporary files.',
+  variable => 'temp_file_compression',
+  boot_val => 'TEMP_NONE_COMPRESSION',
+  options => 'temp_file_compression_options',
+},
+
 { name => 'default_transaction_isolation', type => 'enum', context => 'PGC_USERSET', group => 'CLIENT_CONN_STATEMENT',
   short_desc => 'Sets the transaction isolation level of each new transaction.',
   variable => 'DefaultXactIsoLevel',
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 00c8376cf4d..2fb3891b730 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -78,6 +78,7 @@
 #include "replication/syncrep.h"
 #include "storage/aio.h"
 #include "storage/bufmgr.h"
+#include "storage/buffile.h"
 #include "storage/bufpage.h"
 #include "storage/copydir.h"
 #include "storage/io_worker.h"
@@ -464,6 +465,18 @@ static const struct config_enum_entry default_toast_compression_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * pglz and zstd support should be added as future enhancement
+ */
+static const struct config_enum_entry temp_file_compression_options[] = {
+	{"no", TEMP_NONE_COMPRESSION, false},
+	{"pglz", TEMP_PGLZ_COMPRESSION, false},
+#ifdef  USE_LZ4
+	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+	{NULL, 0, false}
+};
+
 static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
 #ifdef USE_LZ4
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c36fcb9ab61..f380983d2f2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -182,6 +182,7 @@
 
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
+#temp_file_compression = 'no'		# enables temporary files compression
 
 # - Kernel Resources -
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index e529ceb8260..d862e22ef18 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false);
+		lts->pfile = BufFileCreateTemp(false, false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index c9aecab8d66..ef85924cd21 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			state->myfile = BufFileCreateTemp(state->interXact, false);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index a2f4821f240..57908dd5462 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -32,11 +32,21 @@
 
 typedef struct BufFile BufFile;
 
+typedef enum
+{
+	TEMP_NONE_COMPRESSION,
+	TEMP_PGLZ_COMPRESSION,
+	TEMP_LZ4_COMPRESSION
+} TempCompression;
+
+extern PGDLLIMPORT int temp_file_compression;
+
 /*
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact);
+extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
+extern BufFile *BufFileCreateCompressTemp(bool interXact);
 extern void BufFileClose(BufFile *file);
 pg_nodiscard extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
 extern void BufFileReadExact(BufFile *file, void *ptr, size_t size);
-- 
2.51.0

v20251001-0002-whitespace.patchtext/x-patch; charset=UTF-8; name=v20251001-0002-whitespace.patchDownload

From f20e063b2110a0193f5353fdb4a8d4732cc1bb9a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 15:00:37 +0200
Subject: [PATCH v20251001 02/25] whitespace

---
 src/backend/storage/file/buffile.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 3cb3b4fcbb7..9deb1e8a2be 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -282,13 +282,13 @@ BufFileCreateCompressTemp(bool interXact)
 		 * - Compression type changed
 		 * - Current buffer is too small
 		 */
-		if (buff == NULL || 
+		if (buff == NULL ||
 			allocated_for_compression != temp_file_compression ||
 			allocated_size < size)
 		{
 			if (buff != NULL)
 				pfree(buff);
-			
+
 			/*
 			 * Persistent buffer for all temporary file compressions
 			 */
@@ -573,7 +573,7 @@ BufFileLoadBuffer(BufFile *file)
 							sizeof(nbytes),
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
-		
+
 		/* Check if first read succeeded */
 		if (nread != sizeof(nbytes) && nread > 0)
 		{
@@ -581,7 +581,7 @@ BufFileLoadBuffer(BufFile *file)
 					(errcode(ERRCODE_DATA_CORRUPTED),
 					 errmsg_internal("first read is broken")));
 		}
-		
+
 		/* if not EOF let's continue */
 		if (nread > 0)
 		{
@@ -604,8 +604,8 @@ BufFileLoadBuffer(BufFile *file)
 				if (nread_orig != sizeof(original_size) && nread_orig > 0) {
 					ereport(ERROR,
 							(errcode(ERRCODE_DATA_CORRUPTED),
-							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes", 
-							 				 (int)sizeof(original_size), nread_orig)));
+							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes",
+											 (int)sizeof(original_size), nread_orig)));
 				}
 
 				if (nread_orig <= 0) {
@@ -616,7 +616,7 @@ BufFileLoadBuffer(BufFile *file)
 				/* Check if data is uncompressed (marker = -1) */
 				if (original_size == -1) {
 
-                    int nread_data = 0;
+					int nread_data = 0;
 					/* Uncompressed data: read directly into buffer */
 					file->curOffset += 2 * sizeof(int);  /* Skip both header fields */
 					nread_data = FileRead(thisfile,
@@ -734,7 +734,7 @@ BufFileDumpBuffer(BufFile *file)
 		if (cSize <= 0) {
 			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
 
-                int marker;
+				int marker;
 				/* PGLZ compression failed, store uncompressed data with -1 marker */
 				memcpy(cData, &nbytesOriginal, sizeof(int));  /* First field: original size */
 				marker = -1;  /* Second field: -1 = uncompressed marker */
@@ -746,8 +746,8 @@ BufFileDumpBuffer(BufFile *file)
 				/* LZ4 compression failed, report error */
 				ereport(ERROR,
 						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d", 
-						 				 cSize, nbytesOriginal)));
+						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d",
+										 cSize, nbytesOriginal)));
 			}
 		} else {
 			/*
-- 
2.51.0

v20251001-0003-pgindent.patchtext/x-patch; charset=UTF-8; name=v20251001-0003-pgindent.patchDownload

From 1ee6e42e2ecb10e95f709cdd56de2a29175579c7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 15:03:14 +0200
Subject: [PATCH v20251001 03/25] pgindent

---
 src/backend/storage/file/buffile.c | 167 ++++++++++++++++-------------
 src/tools/pgindent/typedefs.list   |   1 +
 2 files changed, 96 insertions(+), 72 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 9deb1e8a2be..127c1c2f427 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -73,7 +73,7 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
-int temp_file_compression = TEMP_NONE_COMPRESSION;
+int			temp_file_compression = TEMP_NONE_COMPRESSION;
 
 /*
  * This data structure represents a buffered file that consists of one or
@@ -115,9 +115,9 @@ struct BufFile
 	 */
 	PGAlignedBlock buffer;
 
-	bool		compress_tempfile; /* transparent compression mode */
-	bool		compress; /* State of usage file compression */
-	char		*cBuffer; /* compression buffer */
+	bool		compress_tempfile;	/* transparent compression mode */
+	bool		compress;		/* State of usage file compression */
+	char	   *cBuffer;		/* compression buffer */
 };
 
 static BufFile *makeBufFileCommon(int nfiles);
@@ -256,8 +256,8 @@ BufFile *
 BufFileCreateCompressTemp(bool interXact)
 {
 	static char *buff = NULL;
-	static int allocated_for_compression = TEMP_NONE_COMPRESSION;
-	static int allocated_size = 0;
+	static int	allocated_for_compression = TEMP_NONE_COMPRESSION;
+	static int	allocated_size = 0;
 	BufFile    *tmpBufFile = BufFileCreateTemp(interXact, true);
 
 	if (temp_file_compression != TEMP_NONE_COMPRESSION)
@@ -277,10 +277,8 @@ BufFileCreateCompressTemp(bool interXact)
 		}
 
 		/*
-		 * Allocate or reallocate buffer if needed:
-		 * - Buffer is NULL (first time)
-		 * - Compression type changed
-		 * - Current buffer is too small
+		 * Allocate or reallocate buffer if needed: - Buffer is NULL (first
+		 * time) - Compression type changed - Current buffer is too small
 		 */
 		if (buff == NULL ||
 			allocated_for_compression != temp_file_compression ||
@@ -544,8 +542,8 @@ BufFileLoadBuffer(BufFile *file)
 	{
 
 		/*
-		* Read whatever we can get, up to a full bufferload.
-		*/
+		 * Read whatever we can get, up to a full bufferload.
+		 */
 		file->nbytes = FileRead(thisfile,
 								file->buffer.data,
 								sizeof(file->buffer),
@@ -559,20 +557,23 @@ BufFileLoadBuffer(BufFile *file)
 					 errmsg("could not read file \"%s\": %m",
 							FilePathName(thisfile))));
 		}
-	/*
-	 * Read and decompress data from the temporary file
-	 * The first reading loads size of the compressed block
-	 * Second reading loads compressed data
-	 */
-	} else {
-		int nread;
-		int nbytes;
+
+		/*
+		 * Read and decompress data from the temporary file The first reading
+		 * loads size of the compressed block Second reading loads compressed
+		 * data
+		 */
+	}
+	else
+	{
+		int			nread;
+		int			nbytes;
 
 		nread = FileRead(thisfile,
-							&nbytes,
-							sizeof(nbytes),
-							file->curOffset,
-							WAIT_EVENT_BUFFILE_READ);
+						 &nbytes,
+						 sizeof(nbytes),
+						 file->curOffset,
+						 WAIT_EVENT_BUFFILE_READ);
 
 		/* Check if first read succeeded */
 		if (nread != sizeof(nbytes) && nread > 0)
@@ -586,44 +587,51 @@ BufFileLoadBuffer(BufFile *file)
 		if (nread > 0)
 		{
 			/* A long life buffer limits number of memory allocations */
-			char * buff = file->cBuffer;
-			int original_size = 0;
-			int header_advance = sizeof(nbytes);
+			char	   *buff = file->cBuffer;
+			int			original_size = 0;
+			int			header_advance = sizeof(nbytes);
 
 			Assert(file->cBuffer != NULL);
 
 			/* For PGLZ, read additional original size */
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
-				int nread_orig = FileRead(thisfile,
-							&original_size,
-							sizeof(original_size),
-							file->curOffset + sizeof(nbytes),
-							WAIT_EVENT_BUFFILE_READ);
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
+			{
+				int			nread_orig = FileRead(thisfile,
+												  &original_size,
+												  sizeof(original_size),
+												  file->curOffset + sizeof(nbytes),
+												  WAIT_EVENT_BUFFILE_READ);
 
 				/* Check if second read succeeded */
-				if (nread_orig != sizeof(original_size) && nread_orig > 0) {
+				if (nread_orig != sizeof(original_size) && nread_orig > 0)
+				{
 					ereport(ERROR,
 							(errcode(ERRCODE_DATA_CORRUPTED),
 							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes",
-											 (int)sizeof(original_size), nread_orig)));
+											 (int) sizeof(original_size), nread_orig)));
 				}
 
-				if (nread_orig <= 0) {
+				if (nread_orig <= 0)
+				{
 					file->nbytes = 0;
 					return;
 				}
 
 				/* Check if data is uncompressed (marker = -1) */
-				if (original_size == -1) {
+				if (original_size == -1)
+				{
+
+					int			nread_data = 0;
 
-					int nread_data = 0;
 					/* Uncompressed data: read directly into buffer */
-					file->curOffset += 2 * sizeof(int);  /* Skip both header fields */
+					file->curOffset += 2 * sizeof(int); /* Skip both header
+														 * fields */
 					nread_data = FileRead(thisfile,
-											file->buffer.data,
-											nbytes,  /* nbytes contains original size */
-											file->curOffset,
-											WAIT_EVENT_BUFFILE_READ);
+										  file->buffer.data,
+										  nbytes,	/* nbytes contains
+													 * original size */
+										  file->curOffset,
+										  WAIT_EVENT_BUFFILE_READ);
 					file->nbytes = nread_data;
 					file->curOffset += nread_data;
 					return;
@@ -633,31 +641,31 @@ BufFileLoadBuffer(BufFile *file)
 			}
 
 			/*
-			 * Read compressed data, curOffset differs with pos
-			 * It reads less data than it returns to caller
-			 * So the curOffset must be advanced here based on compressed size
+			 * Read compressed data, curOffset differs with pos It reads less
+			 * data than it returns to caller So the curOffset must be
+			 * advanced here based on compressed size
 			 */
 			file->curOffset += header_advance;
 
 			nread = FileRead(thisfile,
-							buff,
-							nbytes,
-							file->curOffset,
-							WAIT_EVENT_BUFFILE_READ);
+							 buff,
+							 nbytes,
+							 file->curOffset,
+							 WAIT_EVENT_BUFFILE_READ);
 
 			switch (temp_file_compression)
 			{
 				case TEMP_LZ4_COMPRESSION:
 #ifdef USE_LZ4
 					file->nbytes = LZ4_decompress_safe(buff,
-						file->buffer.data,nbytes,sizeof(file->buffer));
+													   file->buffer.data, nbytes, sizeof(file->buffer));
 #endif
 					break;
 
-							case TEMP_PGLZ_COMPRESSION:
-				file->nbytes = pglz_decompress(buff,nbytes,
-					file->buffer.data,original_size,false);
-				break;
+				case TEMP_PGLZ_COMPRESSION:
+					file->nbytes = pglz_decompress(buff, nbytes,
+												   file->buffer.data, original_size, false);
+					break;
 			}
 			file->curOffset += nread;
 
@@ -731,35 +739,49 @@ BufFileDumpBuffer(BufFile *file)
 		}
 
 		/* Check if compression was successful */
-		if (cSize <= 0) {
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+		if (cSize <= 0)
+		{
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
+			{
 
-				int marker;
-				/* PGLZ compression failed, store uncompressed data with -1 marker */
-				memcpy(cData, &nbytesOriginal, sizeof(int));  /* First field: original size */
-				marker = -1;  /* Second field: -1 = uncompressed marker */
+				int			marker;
+
+				/*
+				 * PGLZ compression failed, store uncompressed data with -1
+				 * marker
+				 */
+				memcpy(cData, &nbytesOriginal, sizeof(int));	/* First field: original
+																 * size */
+				marker = -1;	/* Second field: -1 = uncompressed marker */
 				memcpy(cData + sizeof(int), &marker, sizeof(int));
 				memcpy(cData + 2 * sizeof(int), file->buffer.data, nbytesOriginal);
 				file->nbytes = nbytesOriginal + 2 * sizeof(int);
 				DataToWrite = cData;
-			} else {
+			}
+			else
+			{
 				/* LZ4 compression failed, report error */
 				ereport(ERROR,
 						(errcode(ERRCODE_DATA_CORRUPTED),
 						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d",
 										 cSize, nbytesOriginal)));
 			}
-		} else {
+		}
+		else
+		{
 			/*
-			 * Write header in front of compressed data
-			 * LZ4 format: [compressed_size:int][compressed_data]
-			 * PGLZ format: [compressed_size:int][original_size:int][compressed_data]
+			 * Write header in front of compressed data LZ4 format:
+			 * [compressed_size:int][compressed_data] PGLZ format:
+			 * [compressed_size:int][original_size:int][compressed_data]
 			 */
 			memcpy(cData, &cSize, sizeof(int));
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION) {
+			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
+			{
 				memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
 				file->nbytes = cSize + 2 * sizeof(int);
-			} else {
+			}
+			else
+			{
 				file->nbytes = cSize + sizeof(int);
 			}
 			DataToWrite = cData;
@@ -836,6 +858,7 @@ BufFileDumpBuffer(BufFile *file)
 	if (!file->compress)
 		file->curOffset -= (file->nbytes - file->pos);
 	else if (nbytesOriginal - file->pos != 0)
+
 		/*
 		 * curOffset must be corrected also if compression is enabled, nbytes
 		 * was changed by compression but we have to use the original value of
@@ -879,11 +902,11 @@ BufFileReadCommon(BufFile *file, void *ptr, size_t size, bool exact, bool eofOK)
 	{
 		if (file->pos >= file->nbytes)
 		{
-			/* Try to load more data into buffer.
+			/*
+			 * Try to load more data into buffer.
 			 *
-			 * curOffset is moved within BufFileLoadBuffer
-			 * because stored data size differs from loaded/
-			 * decompressed size
+			 * curOffset is moved within BufFileLoadBuffer because stored data
+			 * size differs from loaded/ decompressed size
 			 */
 			if (!file->compress)
 				file->curOffset += file->pos;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37f26f6c6b7..0a9fcaf949c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3019,6 +3019,7 @@ Tcl_NotifierProcs
 Tcl_Obj
 Tcl_Size
 Tcl_Time
+TempCompression
 TempNamespaceStatus
 TestDSMRegistryHashEntry
 TestDSMRegistryStruct
-- 
2.51.0

v20251001-0004-Add-regression-tests-for-temporary-file-co.patchtext/x-patch; charset=UTF-8; name=v20251001-0004-Add-regression-tests-for-temporary-file-co.patchDownload

From ee00751bfa78652dc6cf9e45b695de5903b393b0 Mon Sep 17 00:00:00 2001
From: Filip Janus <fjanus@redhat.com>
Date: Thu, 31 Jul 2025 14:02:45 +0200
Subject: [PATCH v20251001 04/25] Add regression tests for temporary file
 compression

This commit adds comprehensive regression tests for the transparent
temporary file compression feature.

Test coverage:
- join_hash_lz4.sql: Tests hash join operations with LZ4 compression
- join_hash_pglz.sql: Tests hash join operations with PGLZ compression
- Both tests verify compression works correctly for various hash join scenarios
- Expected output files for validation

Test integration:
- LZ4 tests are conditionally enabled when PostgreSQL is built with --with-lz4
- PGLZ tests are always enabled as PGLZ is built-in
- Tests added to parallel regression test schedule
- GNUmakefile updated to include conditional LZ4 test execution

The tests ensure that compression/decompression works transparently
without affecting query results, while providing coverage for both
supported compression algorithms.
---
 src/test/regress/GNUmakefile                 |    4 +
 src/test/regress/expected/join_hash_lz4.out  | 1166 ++++++++++++++++++
 src/test/regress/expected/join_hash_pglz.out | 1166 ++++++++++++++++++
 src/test/regress/parallel_schedule           |    4 +-
 src/test/regress/sql/join_hash_lz4.sql       |  626 ++++++++++
 src/test/regress/sql/join_hash_pglz.sql      |  626 ++++++++++
 6 files changed, 3591 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/join_hash_lz4.out
 create mode 100644 src/test/regress/expected/join_hash_pglz.out
 create mode 100644 src/test/regress/sql/join_hash_lz4.sql
 create mode 100644 src/test/regress/sql/join_hash_pglz.sql

diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index ef2bddf42ca..94df5649e34 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -94,6 +94,10 @@ installdirs-tests: installdirs
 REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 \
 	$(EXTRA_REGRESS_OPTS)
 
+ifeq ($(with_lz4),yes)
+override EXTRA_TESTS := $(EXTRA_TESTS) join_hash_lz4
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/expected/join_hash_lz4.out b/src/test/regress/expected/join_hash_lz4.out
new file mode 100644
index 00000000000..966a5cd8f55
--- /dev/null
+++ b/src/test/regress/expected/join_hash_lz4.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/expected/join_hash_pglz.out b/src/test/regress/expected/join_hash_pglz.out
new file mode 100644
index 00000000000..99c67f982af
--- /dev/null
+++ b/src/test/regress/expected/join_hash_pglz.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..d62d44814ef 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -15,7 +15,6 @@ test: test_setup
 # The first group of parallel tests
 # ----------
 test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
-
 # ----------
 # The second group of parallel tests
 # multirangetypes depends on rangetypes
@@ -140,3 +139,6 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# this test is equivalent to join_hash test just the compression is enabled
+test: join_hash_pglz
diff --git a/src/test/regress/sql/join_hash_lz4.sql b/src/test/regress/sql/join_hash_lz4.sql
new file mode 100644
index 00000000000..1d19c1980e1
--- /dev/null
+++ b/src/test/regress/sql/join_hash_lz4.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'lz4';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
diff --git a/src/test/regress/sql/join_hash_pglz.sql b/src/test/regress/sql/join_hash_pglz.sql
new file mode 100644
index 00000000000..2686afab272
--- /dev/null
+++ b/src/test/regress/sql/join_hash_pglz.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'pglz';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.51.0

v20251001-0005-remove-unused-BufFile-compress_tempfile.patchtext/x-patch; charset=UTF-8; name=v20251001-0005-remove-unused-BufFile-compress_tempfile.patchDownload

From 64f5ba0b3afed5dced3747542ce587616433dd7a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 19:21:57 +0200
Subject: [PATCH v20251001 05/25] remove unused BufFile->compress_tempfile

---
 src/backend/storage/file/buffile.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 127c1c2f427..9f466e92fa2 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -115,7 +115,6 @@ struct BufFile
 	 */
 	PGAlignedBlock buffer;
 
-	bool		compress_tempfile;	/* transparent compression mode */
 	bool		compress;		/* State of usage file compression */
 	char	   *cBuffer;		/* compression buffer */
 };
@@ -144,7 +143,6 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
-	file->compress_tempfile = false;
 	file->compress = false;
 	file->cBuffer = NULL;
 
-- 
2.51.0

v20251001-0006-simplify-BufFileCreateTemp-interface.patchtext/x-patch; charset=UTF-8; name=v20251001-0006-simplify-BufFileCreateTemp-interface.patchDownload

From ac4deacc0050b4aef78900f46b9cff9e61a25d1d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 21:10:37 +0200
Subject: [PATCH v20251001 06/25] simplify BufFileCreateTemp interface

Having BufFileCreateTemp(interXact,compress) seems unnecessary, when
everyone calls it with compress=false, with the exception of
BufFileCreateCompressTemp.

In fact, it seems fragile, because what if someone happens to call it
with true, without initializing the buffer?
---
 src/backend/access/gist/gistbuildbuffers.c |  2 +-
 src/backend/backup/backup_manifest.c       |  2 +-
 src/backend/storage/file/buffile.c         | 15 +++++++--------
 src/backend/utils/sort/logtape.c           |  2 +-
 src/backend/utils/sort/tuplestore.c        |  2 +-
 src/include/storage/buffile.h              |  2 +-
 6 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 9cc371f47fe..0707254d18e 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -54,7 +54,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	 * Create a temporary file to hold buffer pages that are swapped out of
 	 * memory.
 	 */
-	gfbb->pfile = BufFileCreateTemp(false, false);
+	gfbb->pfile = BufFileCreateTemp(false);
 	gfbb->nFileBlocks = 0;
 
 	/* Initialize free page management. */
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index 35d088db0f3..d05252f383c 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -65,7 +65,7 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 		manifest->buffile = NULL;
 	else
 	{
-		manifest->buffile = BufFileCreateTemp(false, false);
+		manifest->buffile = BufFileCreateTemp(false);
 		manifest->manifest_ctx = pg_cryptohash_create(PG_SHA256);
 		if (pg_cryptohash_init(manifest->manifest_ctx) < 0)
 			elog(ERROR, "failed to initialize checksum of backup manifest: %s",
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 9f466e92fa2..88a1a30e418 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -215,7 +215,7 @@ extendBufFile(BufFile *file)
  * buffile will corrupt temporary data offsets.
  */
 BufFile *
-BufFileCreateTemp(bool interXact, bool compress)
+BufFileCreateTemp(bool interXact)
 {
 	BufFile    *file;
 	File		pfile;
@@ -237,11 +237,6 @@ BufFileCreateTemp(bool interXact, bool compress)
 	file = makeBufFile(pfile);
 	file->isInterXact = interXact;
 
-	if (temp_file_compression != TEMP_NONE_COMPRESSION)
-	{
-		file->compress = compress;
-	}
-
 	return file;
 }
 
@@ -256,12 +251,14 @@ BufFileCreateCompressTemp(bool interXact)
 	static char *buff = NULL;
 	static int	allocated_for_compression = TEMP_NONE_COMPRESSION;
 	static int	allocated_size = 0;
-	BufFile    *tmpBufFile = BufFileCreateTemp(interXact, true);
+	BufFile    *tmpBufFile = BufFileCreateTemp(interXact);
 
 	if (temp_file_compression != TEMP_NONE_COMPRESSION)
 	{
 		int			size = 0;
 
+		tmpBufFile->compress = true;
+
 		switch (temp_file_compression)
 		{
 			case TEMP_LZ4_COMPRESSION:
@@ -292,8 +289,10 @@ BufFileCreateCompressTemp(bool interXact)
 			allocated_for_compression = temp_file_compression;
 			allocated_size = size;
 		}
+
+		tmpBufFile->cBuffer = buff;
 	}
-	tmpBufFile->cBuffer = buff;
+
 	return tmpBufFile;
 }
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index d862e22ef18..e529ceb8260 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -592,7 +592,7 @@ LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
 		lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
 	}
 	else
-		lts->pfile = BufFileCreateTemp(false, false);
+		lts->pfile = BufFileCreateTemp(false);
 
 	return lts;
 }
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index ef85924cd21..c9aecab8d66 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact, false);
+			state->myfile = BufFileCreateTemp(state->interXact);
 
 			MemoryContextSwitchTo(oldcxt);
 
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 57908dd5462..49594f1948e 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -45,7 +45,7 @@ extern PGDLLIMPORT int temp_file_compression;
  * prototypes for functions in buffile.c
  */
 
-extern BufFile *BufFileCreateTemp(bool interXact, bool compress);
+extern BufFile *BufFileCreateTemp(bool interXact);
 extern BufFile *BufFileCreateCompressTemp(bool interXact);
 extern void BufFileClose(BufFile *file);
 pg_nodiscard extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
-- 
2.51.0

v20251001-0007-improve-BufFileCreateTemp-BufFileCreateCom.patchtext/x-patch; charset=UTF-8; name=v20251001-0007-improve-BufFileCreateTemp-BufFileCreateCom.patchDownload

From 50e0c749c1ad6df3b03f2e0528232f154db5d927 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 21:17:20 +0200
Subject: [PATCH v20251001 07/25] improve
 BufFileCreateTemp/BufFileCreateCompressTemp comments

---
 src/backend/storage/file/buffile.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 88a1a30e418..6a43ad1ee38 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -206,13 +206,6 @@ extendBufFile(BufFile *file)
  * Note: if interXact is true, the caller had better be calling us in a
  * memory context, and with a resource owner, that will survive across
  * transaction boundaries.
- *
- * If compress is true the temporary files will be compressed before
- * writing on disk.
- *
- * Note: The compression does not support random access. Only the hash joins
- * use it for now. The seek operation other than seek to the beginning of the
- * buffile will corrupt temporary data offsets.
  */
 BufFile *
 BufFileCreateTemp(bool interXact)
@@ -241,9 +234,26 @@ BufFileCreateTemp(bool interXact)
 }
 
 /*
- * Wrapper for BufFileCreateTemp
- * We want to limit the number of memory allocations for the compression buffer,
- * only one buffer for all compression operations is enough
+ * BufFileCreateCompressTemp
+ *		Create a temporary file with transparent compression.
+ *
+ * The temporary files will use compression, depending on the current value of
+ * temp_file_compression GUC.
+ *
+ * Note: Compressed files do not support random access. A seek operation other
+ * than seek to the beginning of the buffile will corrupt data.
+ *
+ * Note: The compression algorithm is determined by temp_file_compression GUC.
+ * If set to "none" (TEMP_NONE_COMPRESSION), the file is not compressed.
+ *
+ * Note: We want to limit the number of memory allocations for the compression
+ * buffer. A single buffer is enough, compression happens block at a time.
+ *
+ * XXX Is the allocation optimization worth it? Do we want to allocate stuff
+ * in TopMemoryContext?
+ *
+ * XXX We should prevent such silent data corruption, by errorr-ing out after
+ * incompatible seek.
  */
 BufFile *
 BufFileCreateCompressTemp(bool interXact)
-- 
2.51.0

v20251001-0008-BufFileCreateCompressTemp-cleanup-and-comm.patchtext/x-patch; charset=UTF-8; name=v20251001-0008-BufFileCreateCompressTemp-cleanup-and-comm.patchDownload

From 2b0d51f4e80c3b84f0a492f9b8be718a3047161c Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 21:21:37 +0200
Subject: [PATCH v20251001 08/25] BufFileCreateCompressTemp - cleanup and
 comments

---
 src/backend/storage/file/buffile.c | 32 ++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 6a43ad1ee38..7cfca79ab5c 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -261,14 +261,12 @@ BufFileCreateCompressTemp(bool interXact)
 	static char *buff = NULL;
 	static int	allocated_for_compression = TEMP_NONE_COMPRESSION;
 	static int	allocated_size = 0;
-	BufFile    *tmpBufFile = BufFileCreateTemp(interXact);
+	BufFile    *file = BufFileCreateTemp(interXact);
 
 	if (temp_file_compression != TEMP_NONE_COMPRESSION)
 	{
 		int			size = 0;
 
-		tmpBufFile->compress = true;
-
 		switch (temp_file_compression)
 		{
 			case TEMP_LZ4_COMPRESSION:
@@ -282,28 +280,40 @@ BufFileCreateCompressTemp(bool interXact)
 		}
 
 		/*
-		 * Allocate or reallocate buffer if needed: - Buffer is NULL (first
-		 * time) - Compression type changed - Current buffer is too small
+		 * Allocate or reallocate buffer if needed - first call, compression
+		 * method changed, or the buffer is too small.
+		 *
+		 * XXX Can the buffer be too small if the method did not change? Does the
+		 * method matter, or just the size?
 		 */
-		if (buff == NULL ||
-			allocated_for_compression != temp_file_compression ||
-			allocated_size < size)
+		if ((buff == NULL) ||
+			(allocated_for_compression != temp_file_compression) ||
+			(allocated_size < size))
 		{
+			/*
+			 * FIXME Isn't this pfree wrong? How do we know the buffer is not
+			 * used by some existing temp file?
+			 */
 			if (buff != NULL)
 				pfree(buff);
 
 			/*
-			 * Persistent buffer for all temporary file compressions
+			 * Persistent buffer for all temporary file compressions.
 			 */
 			buff = MemoryContextAlloc(TopMemoryContext, size);
 			allocated_for_compression = temp_file_compression;
 			allocated_size = size;
 		}
 
-		tmpBufFile->cBuffer = buff;
+		file->compress = true;
+		file->cBuffer = buff;
 	}
 
-	return tmpBufFile;
+	/* compression with buffer, or no compression and no buffer */
+	Assert((!file->compress && file->cBuffer == NULL) ||
+		   (file->compress && file->cBuffer != NULL));
+
+	return file;
 }
 
 /*
-- 
2.51.0

v20251001-0009-minor-BufFileLoadBuffer-cleanup.patchtext/x-patch; charset=UTF-8; name=v20251001-0009-minor-BufFileLoadBuffer-cleanup.patchDownload

From efa2065f6c3b118ff2985552aa69418c7f4b78af Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 22:49:57 +0200
Subject: [PATCH v20251001 09/25] minor BufFileLoadBuffer cleanup

---
 src/backend/storage/file/buffile.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 7cfca79ab5c..4d9e2e6b3b0 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -557,7 +557,6 @@ BufFileLoadBuffer(BufFile *file)
 
 	if (!file->compress)
 	{
-
 		/*
 		 * Read whatever we can get, up to a full bufferload.
 		 */
@@ -574,15 +573,13 @@ BufFileLoadBuffer(BufFile *file)
 					 errmsg("could not read file \"%s\": %m",
 							FilePathName(thisfile))));
 		}
-
-		/*
-		 * Read and decompress data from the temporary file The first reading
-		 * loads size of the compressed block Second reading loads compressed
-		 * data
-		 */
 	}
 	else
 	{
+		/*
+		 * Read and decompress data from a temporary file. We first read the
+		 * length of compressed data, then the compressed data itself.
+		 */
 		int			nread;
 		int			nbytes;
 
@@ -596,8 +593,9 @@ BufFileLoadBuffer(BufFile *file)
 		if (nread != sizeof(nbytes) && nread > 0)
 		{
 			ereport(ERROR,
-					(errcode(ERRCODE_DATA_CORRUPTED),
-					 errmsg_internal("first read is broken")));
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							FilePathName(thisfile))));
 		}
 
 		/* if not EOF let's continue */
@@ -623,9 +621,9 @@ BufFileLoadBuffer(BufFile *file)
 				if (nread_orig != sizeof(original_size) && nread_orig > 0)
 				{
 					ereport(ERROR,
-							(errcode(ERRCODE_DATA_CORRUPTED),
-							 errmsg_internal("second read is corrupt: expected %d bytes, got %d bytes",
-											 (int) sizeof(original_size), nread_orig)));
+							(errcode_for_file_access(),
+							 errmsg("could not read file \"%s\": %m",
+									FilePathName(thisfile))));
 				}
 
 				if (nread_orig <= 0)
-- 
2.51.0

v20251001-0010-BufFileLoadBuffer-simpler-FileRead-handlin.patchtext/x-patch; charset=UTF-8; name=v20251001-0010-BufFileLoadBuffer-simpler-FileRead-handlin.patchDownload

From 896667443b886d85239e205d9f782535d59838bf Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 22:52:22 +0200
Subject: [PATCH v20251001 10/25] BufFileLoadBuffer - simpler FileRead handling

---
 src/backend/storage/file/buffile.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 4d9e2e6b3b0..12c2e974783 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -589,17 +589,20 @@ BufFileLoadBuffer(BufFile *file)
 						 file->curOffset,
 						 WAIT_EVENT_BUFFILE_READ);
 
-		/* Check if first read succeeded */
-		if (nread != sizeof(nbytes) && nread > 0)
+		/* did we read the length of the next buffer? */
+		if (nread == 0)
 		{
+			/* eof, nothing to do */
+		}
+		else if (nread != sizeof(nbytes))
+		{
+			/* unexpected number of bytes, also covers (nread < 0) */
 			ereport(ERROR,
 					(errcode_for_file_access(),
 					 errmsg("could not read file \"%s\": %m",
 							FilePathName(thisfile))));
 		}
-
-		/* if not EOF let's continue */
-		if (nread > 0)
+		else
 		{
 			/* A long life buffer limits number of memory allocations */
 			char	   *buff = file->cBuffer;
-- 
2.51.0

v20251001-0011-BufFileLoadBuffer-simpler-FileRead-handlin.patchtext/x-patch; charset=UTF-8; name=v20251001-0011-BufFileLoadBuffer-simpler-FileRead-handlin.patchDownload

From d9ad495364d70f9e91425786f3ed4691475ae693 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 22:56:48 +0200
Subject: [PATCH v20251001 11/25] BufFileLoadBuffer - simpler FileRead handling

---
 src/backend/storage/file/buffile.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 12c2e974783..932c10351bc 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -620,20 +620,19 @@ BufFileLoadBuffer(BufFile *file)
 												  file->curOffset + sizeof(nbytes),
 												  WAIT_EVENT_BUFFILE_READ);
 
-				/* Check if second read succeeded */
-				if (nread_orig != sizeof(original_size) && nread_orig > 0)
+				/*
+				 * Did we read the second (raw) length? We should not get an
+				 * EOF here, we've already read the first length.
+				 */
+				if ((nread_orig == 0) || (nread_orig != sizeof(original_size)))
 				{
+					/* also covers (nread_orig < 0) */
 					ereport(ERROR,
 							(errcode_for_file_access(),
 							 errmsg("could not read file \"%s\": %m",
 									FilePathName(thisfile))));
 				}
 
-				if (nread_orig <= 0)
-				{
-					file->nbytes = 0;
-					return;
-				}
 
 				/* Check if data is uncompressed (marker = -1) */
 				if (original_size == -1)
-- 
2.51.0

v20251001-0012-BufFileLoadBuffer-comment-update.patchtext/x-patch; charset=UTF-8; name=v20251001-0012-BufFileLoadBuffer-comment-update.patchDownload

From 82d0a31ce2e91c1e536aa7b4dcb66c965fe05b83 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 22:59:10 +0200
Subject: [PATCH v20251001 12/25] BufFileLoadBuffer - comment update

---
 src/backend/storage/file/buffile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 932c10351bc..985e297ad25 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -604,7 +604,7 @@ BufFileLoadBuffer(BufFile *file)
 		}
 		else
 		{
-			/* A long life buffer limits number of memory allocations */
+			/* read length of compressed data, read and decompress data */
 			char	   *buff = file->cBuffer;
 			int			original_size = 0;
 			int			header_advance = sizeof(nbytes);
-- 
2.51.0

v20251001-0013-BufFileLoadBuffer-simplify-skipping-header.patchtext/x-patch; charset=UTF-8; name=v20251001-0013-BufFileLoadBuffer-simplify-skipping-header.patchDownload

From 2ff555d64d4c8f5943e43bfed1bad84136e97e95 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 23:07:33 +0200
Subject: [PATCH v20251001 13/25] BufFileLoadBuffer - simplify skipping header

---
 src/backend/storage/file/buffile.c | 33 +++++++++++++++---------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 985e297ad25..406c93b812a 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -607,17 +607,19 @@ BufFileLoadBuffer(BufFile *file)
 			/* read length of compressed data, read and decompress data */
 			char	   *buff = file->cBuffer;
 			int			original_size = 0;
-			int			header_advance = sizeof(nbytes);
 
 			Assert(file->cBuffer != NULL);
 
+			/* advance past the length field */
+			file->curOffset += sizeof(nbytes);
+
 			/* For PGLZ, read additional original size */
 			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
 			{
 				int			nread_orig = FileRead(thisfile,
 												  &original_size,
 												  sizeof(original_size),
-												  file->curOffset + sizeof(nbytes),
+												  file->curOffset,
 												  WAIT_EVENT_BUFFILE_READ);
 
 				/*
@@ -633,28 +635,27 @@ BufFileLoadBuffer(BufFile *file)
 									FilePathName(thisfile))));
 				}
 
+				/* advance past the second length header */
+				file->curOffset += sizeof(original_size);
 
 				/* Check if data is uncompressed (marker = -1) */
 				if (original_size == -1)
 				{
-
-					int			nread_data = 0;
-
-					/* Uncompressed data: read directly into buffer */
-					file->curOffset += 2 * sizeof(int); /* Skip both header
-														 * fields */
-					nread_data = FileRead(thisfile,
-										  file->buffer.data,
-										  nbytes,	/* nbytes contains
+					int nread_data = FileRead(thisfile,
+											  file->buffer.data,
+											  nbytes,	/* nbytes contains
 													 * original size */
-										  file->curOffset,
-										  WAIT_EVENT_BUFFILE_READ);
+											  file->curOffset,
+											  WAIT_EVENT_BUFFILE_READ);
 					file->nbytes = nread_data;
 					file->curOffset += nread_data;
+
+					/*
+					 * FIXME this is wrong, because it skips the track_io_timing
+					 * stuff at the end, etc.
+					 */
 					return;
 				}
-
-				header_advance = 2 * sizeof(int);
 			}
 
 			/*
@@ -662,8 +663,6 @@ BufFileLoadBuffer(BufFile *file)
 			 * data than it returns to caller So the curOffset must be
 			 * advanced here based on compressed size
 			 */
-			file->curOffset += header_advance;
-
 			nread = FileRead(thisfile,
 							 buff,
 							 nbytes,
-- 
2.51.0

v20251001-0014-BufFileDumpBuffer-cleanup-simplification.patchtext/x-patch; charset=UTF-8; name=v20251001-0014-BufFileDumpBuffer-cleanup-simplification.patchDownload

From 0e2ce44afd3024d94d3b991586e2928042be4291 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 23:24:56 +0200
Subject: [PATCH v20251001 14/25] BufFileDumpBuffer - cleanup / simplification

---
 src/backend/storage/file/buffile.c | 82 ++++++++++++++----------------
 1 file changed, 37 insertions(+), 45 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 406c93b812a..1b07ef0c801 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -722,7 +722,14 @@ BufFileDumpBuffer(BufFile *file)
 	int			nbytesOriginal = file->nbytes;
 
 	/*
-	 * Compression logic: compress the buffer data if compression is enabled
+	 * Compress the data if requested for this temporary file (and if enabled
+	 * by the temp_file_compression GUC).
+	 *
+	 * The compressed data is written to the one shared compression buffer.
+	 * There's only a single compression operation at any given time, so one
+	 * buffer is enough.
+	 *
+	 * Then we simply point the "DataToWrite" buffer at the compressed buffer.
 	 */
 	if (file->compress)
 	{
@@ -740,8 +747,10 @@ BufFileDumpBuffer(BufFile *file)
 					int			cBufferSize = LZ4_compressBound(file->nbytes);
 
 					/*
-					 * Using stream compression would lead to the slight
-					 * improvement in compression ratio
+					 * XXX We might use lz4 stream compression here. Depending
+					 * on the data, that might improve the compression ratio.
+					 * The length is stored at the beginning, we'll fill it in
+					 * at the end.
 					 */
 					cSize = LZ4_compress_default(file->buffer.data,
 												 cData + sizeof(int), file->nbytes, cBufferSize);
@@ -754,54 +763,36 @@ BufFileDumpBuffer(BufFile *file)
 				break;
 		}
 
-		/* Check if compression was successful */
 		if (cSize <= 0)
 		{
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
-			{
-
-				int			marker;
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg_internal("compression failed, compressed size %d, original size %d",
+									 cSize, nbytesOriginal)));
+		}
 
-				/*
-				 * PGLZ compression failed, store uncompressed data with -1
-				 * marker
-				 */
-				memcpy(cData, &nbytesOriginal, sizeof(int));	/* First field: original
-																 * size */
-				marker = -1;	/* Second field: -1 = uncompressed marker */
-				memcpy(cData + sizeof(int), &marker, sizeof(int));
-				memcpy(cData + 2 * sizeof(int), file->buffer.data, nbytesOriginal);
-				file->nbytes = nbytesOriginal + 2 * sizeof(int);
-				DataToWrite = cData;
-			}
-			else
-			{
-				/* LZ4 compression failed, report error */
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("LZ4 compression failed: compressed size %d, original size %d",
-										 cSize, nbytesOriginal)));
-			}
+		/*
+		 * Write the compressed length(s) at the beginning of the buffer.
+		 * With lz4 we store just the compressed length, with pglz we store
+		 * both the compressed and raw lengths (because pglz case fails if
+		 * the compressed data would be larger, while lz4 always succeeds).
+		 *
+		 * XXX This seems like an unnecessary consistency. We could write
+		 * both lengths in both cases, to unify the cases. It won't affect
+		 * the efficiency too much, one more int seems negligible when
+		 * compressing BLCKSZ worth of data.
+		 */
+		memcpy(cData, &cSize, sizeof(int));
+		if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
+		{
+			memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
+			file->nbytes = cSize + 2 * sizeof(int);
 		}
 		else
 		{
-			/*
-			 * Write header in front of compressed data LZ4 format:
-			 * [compressed_size:int][compressed_data] PGLZ format:
-			 * [compressed_size:int][original_size:int][compressed_data]
-			 */
-			memcpy(cData, &cSize, sizeof(int));
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
-			{
-				memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
-				file->nbytes = cSize + 2 * sizeof(int);
-			}
-			else
-			{
-				file->nbytes = cSize + sizeof(int);
-			}
-			DataToWrite = cData;
+			file->nbytes = cSize + sizeof(int);
 		}
+		DataToWrite = cData;
 	}
 
 	/*
@@ -874,13 +865,14 @@ BufFileDumpBuffer(BufFile *file)
 	if (!file->compress)
 		file->curOffset -= (file->nbytes - file->pos);
 	else if (nbytesOriginal - file->pos != 0)
-
+	{
 		/*
 		 * curOffset must be corrected also if compression is enabled, nbytes
 		 * was changed by compression but we have to use the original value of
 		 * nbytes
 		 */
 		file->curOffset -= bytestowrite;
+	}
 	if (file->curOffset < 0)	/* handle possible segment crossing */
 	{
 		file->curFile--;
-- 
2.51.0

v20251001-0022-experimental-zlib-gzip-compression.patchtext/x-patch; charset=UTF-8; name=v20251001-0022-experimental-zlib-gzip-compression.patchDownload

From 55ca2b6123cb4dd7820b39ae4ef3bacfb00df81c Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Wed, 1 Oct 2025 01:34:12 +0200
Subject: [PATCH v20251001 22/25] experimental: zlib (gzip?) compression

---
 src/backend/storage/file/buffile.c  | 50 +++++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c |  3 ++
 src/include/storage/buffile.h       |  3 +-
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index e3d4e2a342e..380537fb126 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -60,10 +60,15 @@
 #include <lz4.h>
 #endif
 
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
 /* Compression types */
 #define TEMP_NONE_COMPRESSION  0
 #define TEMP_PGLZ_COMPRESSION  1
 #define TEMP_LZ4_COMPRESSION   2
+#define TEMP_GZIP_COMPRESSION  3
 
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
@@ -294,6 +299,15 @@ BufFileCreateCompressTemp(bool interXact)
 				size = LZ4_compressBound(BLCKSZ) + sizeof(CompressHeader);
 #endif
 				break;
+
+			case TEMP_GZIP_COMPRESSION:
+				{
+#ifdef HAVE_LIBZ
+					size = compressBound(BLCKSZ) + sizeof(CompressHeader);
+#endif
+					break;
+				}
+
 			case TEMP_PGLZ_COMPRESSION:
 				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ) + sizeof(CompressHeader);
 				break;
@@ -689,6 +703,20 @@ BufFileLoadBuffer(BufFile *file)
 #endif
 						break;
 
+					case TEMP_GZIP_COMPRESSION:
+#ifdef HAVE_LIBZ
+						int		ret;
+						size_t	len = sizeof(file->buffer);
+
+						ret = uncompress((uint8 *) file->buffer.data, &len,
+										 (uint8 *) buff, header.len);
+						if (ret != Z_OK)
+							elog(ERROR, "uncompress failed");
+
+						file->nbytes = len;
+#endif
+						break;
+
 					case TEMP_PGLZ_COMPRESSION:
 						file->nbytes = pglz_decompress(buff, header.len,
 													   file->buffer.data, header.raw_len, false);
@@ -785,6 +813,28 @@ BufFileDumpBuffer(BufFile *file)
 								 errmsg_internal("compression failed, compressed size %d, original size %d",
 												 cSize, nbytesOriginal)));
 					}
+#endif
+					break;
+				}
+			case TEMP_GZIP_COMPRESSION:
+				{
+#ifdef HAVE_LIBZ
+					int			ret;
+					size_t		len = compressBound(file->nbytes);
+
+					/* XXX maybe lower level? the default is pretty slow ... */
+					ret = compress2((uint8 *) (cData + sizeof(CompressHeader)), &len,
+									(uint8 *) file->buffer.data, file->nbytes,
+									Z_DEFAULT_COMPRESSION);
+					if (ret != Z_OK)
+					{
+						ereport(ERROR,
+								(errcode(ERRCODE_DATA_CORRUPTED),
+								 errmsg_internal("compression failed, compressed size %d, original size %d",
+												 cSize, nbytesOriginal)));
+					}
+
+					cSize = len;
 #endif
 					break;
 				}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 2fb3891b730..88fbe405d59 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -473,6 +473,9 @@ static const struct config_enum_entry temp_file_compression_options[] = {
 	{"pglz", TEMP_PGLZ_COMPRESSION, false},
 #ifdef  USE_LZ4
 	{"lz4", TEMP_LZ4_COMPRESSION, false},
+#endif
+#ifdef  HAVE_LIBZ
+	{"gzip", TEMP_GZIP_COMPRESSION, false},
 #endif
 	{NULL, 0, false}
 };
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 49594f1948e..7a1bb1d6085 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -36,7 +36,8 @@ typedef enum
 {
 	TEMP_NONE_COMPRESSION,
 	TEMP_PGLZ_COMPRESSION,
-	TEMP_LZ4_COMPRESSION
+	TEMP_LZ4_COMPRESSION,
+	TEMP_GZIP_COMPRESSION
 } TempCompression;
 
 extern PGDLLIMPORT int temp_file_compression;
-- 
2.51.0

v20251001-0023-experimental-zstd-compression.patchtext/x-patch; charset=UTF-8; name=v20251001-0023-experimental-zstd-compression.patchDownload

From 4a2c3e72a646f51d1a2527e1dc486bc33b944fcc Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Wed, 1 Oct 2025 02:44:37 +0200
Subject: [PATCH v20251001 23/25] experimental: zstd compression

---
 src/backend/storage/file/buffile.c  | 50 +++++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c |  3 ++
 src/include/storage/buffile.h       |  3 +-
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 380537fb126..cdb3f150ebf 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -64,11 +64,16 @@
 #include <zlib.h>
 #endif
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
+
 /* Compression types */
 #define TEMP_NONE_COMPRESSION  0
 #define TEMP_PGLZ_COMPRESSION  1
 #define TEMP_LZ4_COMPRESSION   2
 #define TEMP_GZIP_COMPRESSION  3
+#define TEMP_ZSTD_COMPRESSION  4
 
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
@@ -308,6 +313,14 @@ BufFileCreateCompressTemp(bool interXact)
 					break;
 				}
 
+			case TEMP_ZSTD_COMPRESSION:
+				{
+#ifdef USE_ZSTD
+					size = ZSTD_COMPRESSBOUND(BLCKSZ) + sizeof(CompressHeader);
+#endif
+					break;
+				}
+
 			case TEMP_PGLZ_COMPRESSION:
 				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ) + sizeof(CompressHeader);
 				break;
@@ -704,6 +717,7 @@ BufFileLoadBuffer(BufFile *file)
 						break;
 
 					case TEMP_GZIP_COMPRESSION:
+					{
 #ifdef HAVE_LIBZ
 						int		ret;
 						size_t	len = sizeof(file->buffer);
@@ -716,7 +730,21 @@ BufFileLoadBuffer(BufFile *file)
 						file->nbytes = len;
 #endif
 						break;
+					}
+					case TEMP_ZSTD_COMPRESSION:
+					{
+#ifdef USE_ZSTD
+						size_t	ret;
+
+						ret = ZSTD_decompress(file->buffer.data, sizeof(file->buffer),
+											  buff, header.len);
+						if (ZSTD_isError(ret))
+							elog(ERROR, "ZSTD_decompress failed");
 
+						file->nbytes = ret;
+#endif
+						break;
+					}
 					case TEMP_PGLZ_COMPRESSION:
 						file->nbytes = pglz_decompress(buff, header.len,
 													   file->buffer.data, header.raw_len, false);
@@ -835,6 +863,28 @@ BufFileDumpBuffer(BufFile *file)
 					}
 
 					cSize = len;
+#endif
+					break;
+				}
+			case TEMP_ZSTD_COMPRESSION:
+				{
+#ifdef USE_ZSTD
+					size_t		ret;
+					size_t		len = ZSTD_compressBound(file->nbytes);
+
+					/* XXX maybe lower level? the default is pretty slow ... */
+					ret = ZSTD_compress(cData + sizeof(CompressHeader), len,
+										file->buffer.data, file->nbytes,
+										ZSTD_defaultCLevel());
+					if (ZSTD_isError(ret))
+					{
+						ereport(ERROR,
+								(errcode(ERRCODE_DATA_CORRUPTED),
+								 errmsg_internal("compression failed, compressed size %d, original size %d",
+												 cSize, nbytesOriginal)));
+					}
+
+					cSize = ret;
 #endif
 					break;
 				}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 88fbe405d59..9410e31ce00 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -476,6 +476,9 @@ static const struct config_enum_entry temp_file_compression_options[] = {
 #endif
 #ifdef  HAVE_LIBZ
 	{"gzip", TEMP_GZIP_COMPRESSION, false},
+#endif
+#ifdef  USE_ZSTD
+	{"zstd", TEMP_ZSTD_COMPRESSION, false},
 #endif
 	{NULL, 0, false}
 };
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 7a1bb1d6085..ac6fe602939 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -37,7 +37,8 @@ typedef enum
 	TEMP_NONE_COMPRESSION,
 	TEMP_PGLZ_COMPRESSION,
 	TEMP_LZ4_COMPRESSION,
-	TEMP_GZIP_COMPRESSION
+	TEMP_GZIP_COMPRESSION,
+	TEMP_ZSTD_COMPRESSION
 } TempCompression;
 
 extern PGDLLIMPORT int temp_file_compression;
-- 
2.51.0

v20251001-0015-BufFileLoadBuffer-comment.patchtext/x-patch; charset=UTF-8; name=v20251001-0015-BufFileLoadBuffer-comment.patchDownload

From d30b5311dedc959ba9a9308f2c9b08cf69044420 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 23:46:04 +0200
Subject: [PATCH v20251001 15/25] BufFileLoadBuffer - comment

---
 src/backend/storage/file/buffile.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 1b07ef0c801..c8d2999a64c 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -662,6 +662,9 @@ BufFileLoadBuffer(BufFile *file)
 			 * Read compressed data, curOffset differs with pos It reads less
 			 * data than it returns to caller So the curOffset must be
 			 * advanced here based on compressed size
+			 *
+			 * XXX I don't understand what this comment is meant to say. Maybe
+			 * it got broken by the earlier cleanup, not sure.
 			 */
 			nread = FileRead(thisfile,
 							 buff,
-- 
2.51.0

v20251001-0016-BufFileLoadBuffer-missing-FileRead-error-h.patchtext/x-patch; charset=UTF-8; name=v20251001-0016-BufFileLoadBuffer-missing-FileRead-error-h.patchDownload

From 9d884d5076a031ec9a9ab4dc54e265d71f69f09d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 29 Sep 2025 23:48:53 +0200
Subject: [PATCH v20251001 16/25] BufFileLoadBuffer - missing FileRead error
 handling

---
 src/backend/storage/file/buffile.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index c8d2999a64c..125357fc07f 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -641,6 +641,7 @@ BufFileLoadBuffer(BufFile *file)
 				/* Check if data is uncompressed (marker = -1) */
 				if (original_size == -1)
 				{
+					/* FIXME this is missing error handling */
 					int nread_data = FileRead(thisfile,
 											  file->buffer.data,
 											  nbytes,	/* nbytes contains
-- 
2.51.0

v20251001-0017-simplify-the-compression-header.patchtext/x-patch; charset=UTF-8; name=v20251001-0017-simplify-the-compression-header.patchDownload

From 732203f8d95ac2d29da9459a0a22f6ca656a4cf5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 30 Sep 2025 00:29:13 +0200
Subject: [PATCH v20251001 17/25] simplify the compression header

---
 src/backend/storage/file/buffile.c | 223 ++++++++++++++++-------------
 1 file changed, 121 insertions(+), 102 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 125357fc07f..cb6763ed586 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -119,6 +119,23 @@ struct BufFile
 	char	   *cBuffer;		/* compression buffer */
 };
 
+/*
+ * Header written right before each chunk of data with compression enabled.
+ * The 'len' is the length of the data buffer written right after the header,
+ * and 'raw_len' is the length of uncompressed data. If the data ends up not
+ * being compressed (e.g. when pglz does not reach the compression ratio),
+ * the raw_len is set to -1 and the len is the raw (uncompressed) length.
+ *
+ * To make things simpler, we write these headers even for mathods that do
+ * not fail (or rather when they fail, it's a proper error). The space for
+ * an extra integer seems negligible.
+ */
+typedef struct CompressHeader
+{
+	int			len;		/* data length (compressed, excluding header) */
+	int			raw_len;	/* raw length (-1: not compressed) */
+} CompressHeader;
+
 static BufFile *makeBufFileCommon(int nfiles);
 static BufFile *makeBufFile(File firstfile);
 static void extendBufFile(BufFile *file);
@@ -271,11 +288,11 @@ BufFileCreateCompressTemp(bool interXact)
 		{
 			case TEMP_LZ4_COMPRESSION:
 #ifdef USE_LZ4
-				size = LZ4_compressBound(BLCKSZ) + sizeof(int);
+				size = LZ4_compressBound(BLCKSZ) + sizeof(CompressHeader);
 #endif
 				break;
 			case TEMP_PGLZ_COMPRESSION:
-				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ) + 2 * sizeof(int);
+				size = pglz_maximum_compressed_size(BLCKSZ, BLCKSZ) + sizeof(CompressHeader);
 				break;
 		}
 
@@ -578,14 +595,14 @@ BufFileLoadBuffer(BufFile *file)
 	{
 		/*
 		 * Read and decompress data from a temporary file. We first read the
-		 * length of compressed data, then the compressed data itself.
+		 * header with compressed/raw lengths, and then the compressed data.
 		 */
 		int			nread;
-		int			nbytes;
+		CompressHeader	header;
 
 		nread = FileRead(thisfile,
-						 &nbytes,
-						 sizeof(nbytes),
+						 &header,
+						 sizeof(header),
 						 file->curOffset,
 						 WAIT_EVENT_BUFFILE_READ);
 
@@ -594,7 +611,7 @@ BufFileLoadBuffer(BufFile *file)
 		{
 			/* eof, nothing to do */
 		}
-		else if (nread != sizeof(nbytes))
+		else if (nread != sizeof(header))
 		{
 			/* unexpected number of bytes, also covers (nread < 0) */
 			ereport(ERROR,
@@ -604,97 +621,87 @@ BufFileLoadBuffer(BufFile *file)
 		}
 		else
 		{
-			/* read length of compressed data, read and decompress data */
+			/* read length of compressed data, read (and decompress) data */
 			char	   *buff = file->cBuffer;
-			int			original_size = 0;
 
 			Assert(file->cBuffer != NULL);
 
 			/* advance past the length field */
-			file->curOffset += sizeof(nbytes);
+			file->curOffset += sizeof(header);
 
-			/* For PGLZ, read additional original size */
-			if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
+			/*
+			 * raw_len==-1 means the data was not compressed after all, which
+			 * can happen e.g. for non-compressible data with pglz. In that
+			 * case just copy the data in place. Otherwise do the decompression.
+			 *
+			 * XXX Maybe we should just do the FileRead first, and then either
+			 * decompress or memcpy() for raw_len=-1. That'd be an extra memcpy,
+			 * but it'd make the code simpler (this ways we do the error checks
+			 * twice, for each branch).
+			 */
+			if (header.raw_len == -1)
 			{
-				int			nread_orig = FileRead(thisfile,
-												  &original_size,
-												  sizeof(original_size),
-												  file->curOffset,
-												  WAIT_EVENT_BUFFILE_READ);
+				nread = FileRead(thisfile,
+								 file->buffer.data,
+								 header.len,
+								 file->curOffset,
+								 WAIT_EVENT_BUFFILE_READ);
+				if (nread != header.len)
+				{
+					ereport(ERROR,
+							(errcode_for_file_access(),
+							 errmsg("could not read file \"%s\": %m",
+									FilePathName(thisfile))));
+				}
 
+				file->nbytes = nread;
+				file->curOffset += nread;
+			}
+			else
+			{
 				/*
-				 * Did we read the second (raw) length? We should not get an
-				 * EOF here, we've already read the first length.
+				 * Read compressed data into the separate buffer, and then
+				 * decompress into the target file buffer.
 				 */
-				if ((nread_orig == 0) || (nread_orig != sizeof(original_size)))
+				nread = FileRead(thisfile,
+								 buff,
+								 header.len,
+								 file->curOffset,
+								 WAIT_EVENT_BUFFILE_READ);
+				if (nread != header.len)
 				{
-					/* also covers (nread_orig < 0) */
 					ereport(ERROR,
 							(errcode_for_file_access(),
 							 errmsg("could not read file \"%s\": %m",
 									FilePathName(thisfile))));
 				}
 
-				/* advance past the second length header */
-				file->curOffset += sizeof(original_size);
-
-				/* Check if data is uncompressed (marker = -1) */
-				if (original_size == -1)
+				switch (temp_file_compression)
 				{
-					/* FIXME this is missing error handling */
-					int nread_data = FileRead(thisfile,
-											  file->buffer.data,
-											  nbytes,	/* nbytes contains
-													 * original size */
-											  file->curOffset,
-											  WAIT_EVENT_BUFFILE_READ);
-					file->nbytes = nread_data;
-					file->curOffset += nread_data;
+					case TEMP_LZ4_COMPRESSION:
+#ifdef USE_LZ4
+						file->nbytes = LZ4_decompress_safe(buff,
+														   file->buffer.data, header.len,
+														   sizeof(file->buffer));
+#endif
+						break;
 
-					/*
-					 * FIXME this is wrong, because it skips the track_io_timing
-					 * stuff at the end, etc.
-					 */
-					return;
+					case TEMP_PGLZ_COMPRESSION:
+						file->nbytes = pglz_decompress(buff, header.len,
+													   file->buffer.data, header.raw_len, false);
+						break;
 				}
-			}
-
-			/*
-			 * Read compressed data, curOffset differs with pos It reads less
-			 * data than it returns to caller So the curOffset must be
-			 * advanced here based on compressed size
-			 *
-			 * XXX I don't understand what this comment is meant to say. Maybe
-			 * it got broken by the earlier cleanup, not sure.
-			 */
-			nread = FileRead(thisfile,
-							 buff,
-							 nbytes,
-							 file->curOffset,
-							 WAIT_EVENT_BUFFILE_READ);
+				file->curOffset += nread;
 
-			switch (temp_file_compression)
-			{
-				case TEMP_LZ4_COMPRESSION:
-#ifdef USE_LZ4
-					file->nbytes = LZ4_decompress_safe(buff,
-													   file->buffer.data, nbytes, sizeof(file->buffer));
-#endif
-					break;
+				if (file->nbytes < 0)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("compressed data is corrupt")));
 
-				case TEMP_PGLZ_COMPRESSION:
-					file->nbytes = pglz_decompress(buff, nbytes,
-												   file->buffer.data, original_size, false);
-					break;
+				/* should have got the expected length */
+				Assert(file->nbytes == header.raw_len);
 			}
-			file->curOffset += nread;
-
-			if (file->nbytes < 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("compressed lz4 data is corrupt")));
 		}
-
 	}
 
 	if (track_io_timing)
@@ -734,15 +741,23 @@ BufFileDumpBuffer(BufFile *file)
 	 * buffer is enough.
 	 *
 	 * Then we simply point the "DataToWrite" buffer at the compressed buffer.
+	 *
+	 * XXX I'm not 100% happy with all the variables here, there seems to be
+	 * more than necessary.
 	 */
 	if (file->compress)
 	{
 		char	   *cData;
 		int			cSize = 0;
+		CompressHeader	header;
 
 		Assert(file->cBuffer != NULL);
 		cData = file->cBuffer;
 
+		/* initialize the header for compression */
+		header.len = -1;
+		header.raw_len = nbytesOriginal;
+
 		switch (temp_file_compression)
 		{
 			case TEMP_LZ4_COMPRESSION:
@@ -757,45 +772,49 @@ BufFileDumpBuffer(BufFile *file)
 					 * at the end.
 					 */
 					cSize = LZ4_compress_default(file->buffer.data,
-												 cData + sizeof(int), file->nbytes, cBufferSize);
+												 cData + sizeof(CompressHeader),
+												 file->nbytes, cBufferSize);
+					if (cSize < 0)
+					{
+						ereport(ERROR,
+								(errcode(ERRCODE_DATA_CORRUPTED),
+								 errmsg_internal("compression failed, compressed size %d, original size %d",
+												 cSize, nbytesOriginal)));
+					}
 #endif
 					break;
 				}
 			case TEMP_PGLZ_COMPRESSION:
 				cSize = pglz_compress(file->buffer.data, file->nbytes,
-									  cData + 2 * sizeof(int), PGLZ_strategy_always);
+									  cData + sizeof(CompressHeader),
+									  PGLZ_strategy_always);
+
+				/*
+				 * pglz returns -1 for non-compressible data. In that case
+				 * just copy the raw data into the output buffer.
+				 */
+				if (cSize == -1)
+				{
+					memcpy(cData + sizeof(CompressHeader), file->buffer.data,
+						   header.raw_len);
+
+					cSize = header.raw_len;
+					header.raw_len = -1;
+				}
 				break;
 		}
 
-		if (cSize <= 0)
-		{
-			ereport(ERROR,
-					(errcode(ERRCODE_DATA_CORRUPTED),
-					 errmsg_internal("compression failed, compressed size %d, original size %d",
-									 cSize, nbytesOriginal)));
-		}
+		Assert(cSize != -1);
+		header.len = cSize;
 
 		/*
-		 * Write the compressed length(s) at the beginning of the buffer.
-		 * With lz4 we store just the compressed length, with pglz we store
-		 * both the compressed and raw lengths (because pglz case fails if
-		 * the compressed data would be larger, while lz4 always succeeds).
-		 *
-		 * XXX This seems like an unnecessary consistency. We could write
-		 * both lengths in both cases, to unify the cases. It won't affect
-		 * the efficiency too much, one more int seems negligible when
-		 * compressing BLCKSZ worth of data.
+		 * Write the header with compressed length at the beginning of the
+		 * buffer. We store both the compressed and raw lengths, and use
+		 * raw_len=-1 when the data was not compressed after all.
 		 */
-		memcpy(cData, &cSize, sizeof(int));
-		if (temp_file_compression == TEMP_PGLZ_COMPRESSION)
-		{
-			memcpy(cData + sizeof(int), &nbytesOriginal, sizeof(int));
-			file->nbytes = cSize + 2 * sizeof(int);
-		}
-		else
-		{
-			file->nbytes = cSize + sizeof(int);
-		}
+		memcpy(cData, &header, sizeof(CompressHeader));
+		file->nbytes = header.len + sizeof(CompressHeader);
+
 		DataToWrite = cData;
 	}
 
-- 
2.51.0

v20251001-0018-enable-compression-for-tuplestore.patchtext/x-patch; charset=UTF-8; name=v20251001-0018-enable-compression-for-tuplestore.patchDownload

From b0c53c2526106f7db12f1b88772071fb351c5d65 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 30 Sep 2025 02:30:34 +0200
Subject: [PATCH v20251001 18/25] enable compression for tuplestore

disabled when random access requested
---
 src/backend/utils/sort/tuplestore.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index c9aecab8d66..db3de5cd6b2 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -860,7 +860,13 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 			 */
 			oldcxt = MemoryContextSwitchTo(state->context->parent);
 
-			state->myfile = BufFileCreateTemp(state->interXact);
+			/*
+			 * If requested random access, can't compress the temp file.
+			 */
+			if ((state->eflags & EXEC_FLAG_BACKWARD) != 0)
+				state->myfile = BufFileCreateTemp(state->interXact);
+			else
+				state->myfile = BufFileCreateCompressTemp(state->interXact);
 
 			MemoryContextSwitchTo(oldcxt);
 
-- 
2.51.0

v20251001-0019-remember-compression-method-for-each-file.patchtext/x-patch; charset=UTF-8; name=v20251001-0019-remember-compression-method-for-each-file.patchDownload

From e9d19c48cf112a46ae9dead51c16d734b5bf79e8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 30 Sep 2025 02:24:20 +0200
Subject: [PATCH v20251001 19/25] remember compression method for each file

Otherwise we might have a problem if the compression method changes
during lifetime of a file, e.g. with an open cursor. Example:

  set temp_file_compression = 'lz4';
  set work_mem = '1MB';

  begin;
  declare c no scroll cursor for
   select i from generate_series(1,10000000) s(i);
  fetch 10000 from c;
  set temp_file_compression = 'pglz';
  fetch 10000 from c;

This ends with

  ERROR:  compressed data is corrupt
---
 src/backend/storage/file/buffile.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index cb6763ed586..e759e6687b6 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -73,6 +73,9 @@
 #define MAX_PHYSICAL_FILESIZE	0x40000000
 #define BUFFILE_SEG_SIZE		(MAX_PHYSICAL_FILESIZE / BLCKSZ)
 
+/*
+ * Optional transparent compression of temporary files. Disaled by default.
+ */
 int			temp_file_compression = TEMP_NONE_COMPRESSION;
 
 /*
@@ -115,7 +118,7 @@ struct BufFile
 	 */
 	PGAlignedBlock buffer;
 
-	bool		compress;		/* State of usage file compression */
+	int			compress;		/* enabled compression for the file */
 	char	   *cBuffer;		/* compression buffer */
 };
 
@@ -160,7 +163,7 @@ makeBufFileCommon(int nfiles)
 	file->curOffset = 0;
 	file->pos = 0;
 	file->nbytes = 0;
-	file->compress = false;
+	file->compress = TEMP_NONE_COMPRESSION;
 	file->cBuffer = NULL;
 
 	return file;
@@ -322,7 +325,7 @@ BufFileCreateCompressTemp(bool interXact)
 			allocated_size = size;
 		}
 
-		file->compress = true;
+		file->compress = temp_file_compression;
 		file->cBuffer = buff;
 	}
 
@@ -572,7 +575,7 @@ BufFileLoadBuffer(BufFile *file)
 	else
 		INSTR_TIME_SET_ZERO(io_start);
 
-	if (!file->compress)
+	if (file->compress == TEMP_NONE_COMPRESSION)
 	{
 		/*
 		 * Read whatever we can get, up to a full bufferload.
@@ -676,7 +679,7 @@ BufFileLoadBuffer(BufFile *file)
 									FilePathName(thisfile))));
 				}
 
-				switch (temp_file_compression)
+				switch (file->compress)
 				{
 					case TEMP_LZ4_COMPRESSION:
 #ifdef USE_LZ4
@@ -745,7 +748,7 @@ BufFileDumpBuffer(BufFile *file)
 	 * XXX I'm not 100% happy with all the variables here, there seems to be
 	 * more than necessary.
 	 */
-	if (file->compress)
+	if (file->compress != TEMP_NONE_COMPRESSION)
 	{
 		char	   *cData;
 		int			cSize = 0;
@@ -758,7 +761,7 @@ BufFileDumpBuffer(BufFile *file)
 		header.len = -1;
 		header.raw_len = nbytesOriginal;
 
-		switch (temp_file_compression)
+		switch (file->compress)
 		{
 			case TEMP_LZ4_COMPRESSION:
 				{
-- 
2.51.0

v20251001-0020-LZ4_compress_default-returns-0-on-error.patchtext/x-patch; charset=UTF-8; name=v20251001-0020-LZ4_compress_default-returns-0-on-error.patchDownload

From 5e56efbfed5933265eb4117b26dc035d3863749e Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 30 Sep 2025 12:57:57 +0200
Subject: [PATCH v20251001 20/25] LZ4_compress_default returns 0 on error

---
 src/backend/storage/file/buffile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index e759e6687b6..881fc44faa3 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -777,7 +777,7 @@ BufFileDumpBuffer(BufFile *file)
 					cSize = LZ4_compress_default(file->buffer.data,
 												 cData + sizeof(CompressHeader),
 												 file->nbytes, cBufferSize);
-					if (cSize < 0)
+					if (cSize == 0)
 					{
 						ereport(ERROR,
 								(errcode(ERRCODE_DATA_CORRUPTED),
-- 
2.51.0

v20251001-0021-try-LZ4_compress_fast.patchtext/x-patch; charset=UTF-8; name=v20251001-0021-try-LZ4_compress_fast.patchDownload

From f718cb1f9bb4c86579ad8c42e8169bb6f3967d34 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 30 Sep 2025 13:00:10 +0200
Subject: [PATCH v20251001 21/25] try LZ4_compress_fast

---
 src/backend/storage/file/buffile.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 881fc44faa3..e3d4e2a342e 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -774,9 +774,10 @@ BufFileDumpBuffer(BufFile *file)
 					 * The length is stored at the beginning, we'll fill it in
 					 * at the end.
 					 */
-					cSize = LZ4_compress_default(file->buffer.data,
-												 cData + sizeof(CompressHeader),
-												 file->nbytes, cBufferSize);
+					cSize = LZ4_compress_fast(file->buffer.data,
+											  cData + sizeof(CompressHeader),
+											  file->nbytes, cBufferSize,
+											  10);	/* somewhat higher value */
 					if (cSize == 0)
 					{
 						ereport(ERROR,
-- 
2.51.0

v20251001-0024-add-regression-test-for-gzip-zlib.patchtext/x-patch; charset=UTF-8; name=v20251001-0024-add-regression-test-for-gzip-zlib.patchDownload

From 5afc8205b2292ac36457fadfcccab08d20fbb247 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Wed, 1 Oct 2025 03:57:19 +0200
Subject: [PATCH v20251001 24/25] add regression test for gzip/zlib

---
 src/test/regress/GNUmakefile                 |    4 +
 src/test/regress/expected/join_hash_gzip.out | 1166 ++++++++++++++++++
 src/test/regress/sql/join_hash_gzip.sql      |  626 ++++++++++
 3 files changed, 1796 insertions(+)
 create mode 100644 src/test/regress/expected/join_hash_gzip.out
 create mode 100644 src/test/regress/sql/join_hash_gzip.sql

diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index 94df5649e34..4bb8c71f33b 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -98,6 +98,10 @@ ifeq ($(with_lz4),yes)
 override EXTRA_TESTS := $(EXTRA_TESTS) join_hash_lz4
 endif
 
+ifeq ($(with_zlib),yes)
+override EXTRA_TESTS := $(EXTRA_TESTS) join_hash_gzip
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/expected/join_hash_gzip.out b/src/test/regress/expected/join_hash_gzip.out
new file mode 100644
index 00000000000..b8e04588b46
--- /dev/null
+++ b/src/test/regress/expected/join_hash_gzip.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'gzip';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/sql/join_hash_gzip.sql b/src/test/regress/sql/join_hash_gzip.sql
new file mode 100644
index 00000000000..a70974f843e
--- /dev/null
+++ b/src/test/regress/sql/join_hash_gzip.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'gzip';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.51.0

v20251001-0025-add-regression-test-for-zstd.patchtext/x-patch; charset=UTF-8; name=v20251001-0025-add-regression-test-for-zstd.patchDownload

From 25ef9b6868ae9bce3920e520b2f5a169d607d587 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Wed, 1 Oct 2025 03:57:35 +0200
Subject: [PATCH v20251001 25/25] add regression test for zstd

---
 src/Makefile.global.in                       |    1 +
 src/test/regress/GNUmakefile                 |    4 +
 src/test/regress/expected/join_hash_zstd.out | 1166 ++++++++++++++++++
 src/test/regress/sql/join_hash_zstd.sql      |  626 ++++++++++
 4 files changed, 1797 insertions(+)
 create mode 100644 src/test/regress/expected/join_hash_zstd.out
 create mode 100644 src/test/regress/sql/join_hash_zstd.sql

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 3a8b277a9ae..ab53ca7421c 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -205,6 +205,7 @@ with_lz4	= @with_lz4@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
+with_zstd	= @with_zstd@
 enable_rpath	= @enable_rpath@
 enable_nls	= @enable_nls@
 enable_debug	= @enable_debug@
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index 4bb8c71f33b..ddd993f8889 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -102,6 +102,10 @@ ifeq ($(with_zlib),yes)
 override EXTRA_TESTS := $(EXTRA_TESTS) join_hash_gzip
 endif
 
+ifeq ($(with_zstd),yes)
+override EXTRA_TESTS := $(EXTRA_TESTS) join_hash_zstd
+endif
+
 check: all
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
 
diff --git a/src/test/regress/expected/join_hash_zstd.out b/src/test/regress/expected/join_hash_zstd.out
new file mode 100644
index 00000000000..ee342387041
--- /dev/null
+++ b/src/test/regress/expected/join_hash_zstd.out
@@ -0,0 +1,1166 @@
+--
+-- exercises for the hash join code
+--
+begin;
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'zstd';
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | f
+(1 row)
+
+rollback to settings;
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select count(*) from simple r join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ t                    | f
+(1 row)
+
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on bigger_than_it_looks s
+(6 rows)
+
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Hash
+                           ->  Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Partial Aggregate
+               ->  Parallel Hash Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on bigger_than_it_looks s
+(9 rows)
+
+select count(*) from simple r join bigger_than_it_looks s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+ initially_multibatch | increased_batches 
+----------------------+-------------------
+ f                    | t
+(1 row)
+
+rollback to settings;
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                    QUERY PLAN                    
+--------------------------------------------------
+ Aggregate
+   ->  Hash Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on extremely_skewed s
+(6 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Hash
+                     ->  Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     2
+(1 row)
+
+rollback to settings;
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Aggregate
+   ->  Gather
+         Workers Planned: 1
+         ->  Parallel Hash Join
+               Hash Cond: (r.id = s.id)
+               ->  Parallel Seq Scan on simple r
+               ->  Parallel Hash
+                     ->  Parallel Seq Scan on extremely_skewed s
+(8 rows)
+
+select count(*) from simple r join extremely_skewed s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     4
+(1 row)
+
+rollback to settings;
+-- A couple of other hash join tests unrelated to work_mem management.
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+ original | final 
+----------+-------
+        1 |     1
+(1 row)
+
+rollback to settings;
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Hash
+                           ->  Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop Left Join
+         Join Filter: ((join_foo.id < (b1.id + 1)) AND (join_foo.id > (b1.id - 1)))
+         ->  Seq Scan on join_foo
+         ->  Gather
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Hash Cond: (b1.id = b2.id)
+                     ->  Parallel Seq Scan on join_bar b1
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on join_bar b2
+(11 rows)
+
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+ count 
+-------
+     3
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+ multibatch 
+------------
+ f
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: (r.id = s.id)
+         ->  Seq Scan on simple r
+         ->  Hash
+               ->  Seq Scan on simple s
+(6 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: (r.id = s.id)
+                     ->  Parallel Seq Scan on simple r
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple s
+(9 rows)
+
+select  count(*) from simple r full outer join simple s using (id);
+ count 
+-------
+ 20000
+(1 row)
+
+rollback to settings;
+-- A full outer join where every record is not matched.
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+               QUERY PLAN               
+----------------------------------------
+ Aggregate
+   ->  Hash Full Join
+         Hash Cond: ((0 - s.id) = r.id)
+         ->  Seq Scan on simple s
+         ->  Hash
+               ->  Seq Scan on simple r
+(6 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Full Join
+                     Hash Cond: ((0 - s.id) = r.id)
+                     ->  Parallel Seq Scan on simple s
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on simple r
+(9 rows)
+
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+ count 
+-------
+ 40000
+(1 row)
+
+rollback to settings;
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Finalize Aggregate
+   ->  Gather
+         Workers Planned: 2
+         ->  Partial Aggregate
+               ->  Parallel Hash Left Join
+                     Hash Cond: (wide.id = wide_1.id)
+                     ->  Parallel Seq Scan on wide
+                     ->  Parallel Hash
+                           ->  Parallel Seq Scan on wide wide_1
+(9 rows)
+
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+ length 
+--------
+ 320000
+(1 row)
+
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+ multibatch 
+------------
+ t
+(1 row)
+
+rollback to settings;
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ id | id 
+----+----
+  1 |   
+    |  2
+(2 rows)
+
+ROLLBACK TO settings;
+rollback;
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: ((hjtest_1.id = (SubPlan 1)) AND ((SubPlan 2) = (SubPlan 3)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_1
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         Filter: ((SubPlan 4) < 50)
+         SubPlan 4
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   ->  Hash
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         ->  Seq Scan on public.hjtest_2
+               Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+               Filter: ((SubPlan 5) < 55)
+               SubPlan 5
+                 ->  Result
+                       Output: (hjtest_2.c * 5)
+         SubPlan 1
+           ->  Result
+                 Output: 1
+                 One-Time Filter: (hjtest_2.id = 1)
+         SubPlan 3
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   SubPlan 2
+     ->  Result
+           Output: (hjtest_1.b * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Hash Join
+   Output: hjtest_1.a, hjtest_2.a, (hjtest_1.tableoid)::regclass, (hjtest_2.tableoid)::regclass
+   Hash Cond: (((SubPlan 1) = hjtest_1.id) AND ((SubPlan 3) = (SubPlan 2)))
+   Join Filter: (hjtest_1.a <> hjtest_2.b)
+   ->  Seq Scan on public.hjtest_2
+         Output: hjtest_2.a, hjtest_2.tableoid, hjtest_2.id, hjtest_2.c, hjtest_2.b
+         Filter: ((SubPlan 5) < 55)
+         SubPlan 5
+           ->  Result
+                 Output: (hjtest_2.c * 5)
+   ->  Hash
+         Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+         ->  Seq Scan on public.hjtest_1
+               Output: hjtest_1.a, hjtest_1.tableoid, hjtest_1.id, hjtest_1.b
+               Filter: ((SubPlan 4) < 50)
+               SubPlan 4
+                 ->  Result
+                       Output: (hjtest_1.b * 5)
+         SubPlan 2
+           ->  Result
+                 Output: (hjtest_1.b * 5)
+   SubPlan 1
+     ->  Result
+           Output: 1
+           One-Time Filter: (hjtest_2.id = 1)
+   SubPlan 3
+     ->  Result
+           Output: (hjtest_2.c * 5)
+(28 rows)
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+  a1  | a2 |    t1    |    t2    
+------+----+----------+----------
+ text | t  | hjtest_1 | hjtest_2
+(1 row)
+
+ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1 
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/sql/join_hash_zstd.sql b/src/test/regress/sql/join_hash_zstd.sql
new file mode 100644
index 00000000000..3337ce02888
--- /dev/null
+++ b/src/test/regress/sql/join_hash_zstd.sql
@@ -0,0 +1,626 @@
+--
+-- exercises for the hash join code
+--
+
+begin;
+
+set local min_parallel_table_scan_size = 0;
+set local parallel_setup_cost = 0;
+set local enable_hashjoin = on;
+set local temp_file_compression = 'zstd';
+
+-- Extract bucket and batch counts from an explain analyze plan.  In
+-- general we can't make assertions about how many batches (or
+-- buckets) will be required because it can vary, but we can in some
+-- special cases and we can check for growth.
+create or replace function find_hash(node json)
+returns json language plpgsql
+as
+$$
+declare
+  x json;
+  child json;
+begin
+  if node->>'Node Type' = 'Hash' then
+    return node;
+  else
+    for child in select json_array_elements(node->'Plans')
+    loop
+      x := find_hash(child);
+      if x is not null then
+        return x;
+      end if;
+    end loop;
+    return null;
+  end if;
+end;
+$$;
+create or replace function hash_join_batches(query text)
+returns table (original int, final int) language plpgsql
+as
+$$
+declare
+  whole_plan json;
+  hash_node json;
+begin
+  for whole_plan in
+    execute 'explain (analyze, format ''json'') ' || query
+  loop
+    hash_node := find_hash(json_extract_path(whole_plan, '0', 'Plan'));
+    original := hash_node->>'Original Hash Batches';
+    final := hash_node->>'Hash Batches';
+    return next;
+  end loop;
+end;
+$$;
+
+-- Make a simple relation with well distributed keys and correctly
+-- estimated size.
+create table simple as
+  select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table simple set (parallel_workers = 2);
+analyze simple;
+
+-- Make a relation whose size we will under-estimate.  We want stats
+-- to say 1000 rows, but actually there are 20,000 rows.
+create table bigger_than_it_looks as
+  select generate_series(1, 20000) as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
+alter table bigger_than_it_looks set (parallel_workers = 2);
+analyze bigger_than_it_looks;
+update pg_class set reltuples = 1000 where relname = 'bigger_than_it_looks';
+
+-- Make a relation whose size we underestimate and that also has a
+-- kind of skew that breaks our batching scheme.  We want stats to say
+-- 2 rows, but actually there are 20,000 rows with the same key.
+create table extremely_skewed (id int, t text);
+alter table extremely_skewed set (autovacuum_enabled = 'false');
+alter table extremely_skewed set (parallel_workers = 2);
+analyze extremely_skewed;
+insert into extremely_skewed
+  select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+  from generate_series(1, 20000);
+update pg_class
+  set reltuples = 2, relpages = pg_relation_size('extremely_skewed') / 8192
+  where relname = 'extremely_skewed';
+
+-- Make a relation with a couple of enormous tuples.
+create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t;
+alter table wide set (parallel_workers = 2);
+
+-- The "optimal" case: the hash table fits in memory; we plan for 1
+-- batch, we stick to that number, and peak memory usage stays within
+-- our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- The "good" case: batches required, but we plan the right number; we
+-- plan for some number of batches, and we stick to that number, and
+-- peak memory usage says within our work_mem budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join simple s using (id);
+select count(*) from simple r join simple s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+-- parallel full multi-batch hash join
+select count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- The "bad" case: during execution we need to increase number of
+-- batches; in this case we plan for 1 batch, and increase at least a
+-- couple of times, and peak memory usage stays within our work_mem
+-- budget
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) FROM simple r JOIN bigger_than_it_looks s USING (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '192kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+select count(*) from simple r join bigger_than_it_looks s using (id);
+select original > 1 as initially_multibatch, final > original as increased_batches
+  from hash_join_batches(
+$$
+  select count(*) from simple r join bigger_than_it_looks s using (id);
+$$);
+rollback to settings;
+
+-- The "ugly" case: increasing the number of batches during execution
+-- doesn't help, so stop trying to fit in work_mem and hope for the
+-- best; in this case we plan for 1 batch, increases just once and
+-- then stop increasing because that didn't help at all, so we blow
+-- right through the work_mem budget and hope for the best...
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-oblivious hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = off;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- parallel with parallel-aware hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 1;
+set local work_mem = '128kB';
+set local hash_mem_multiplier = 1.0;
+set local enable_parallel_hash = on;
+explain (costs off)
+  select count(*) from simple r join extremely_skewed s using (id);
+select count(*) from simple r join extremely_skewed s using (id);
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join extremely_skewed s using (id);
+$$);
+rollback to settings;
+
+-- A couple of other hash join tests unrelated to work_mem management.
+
+-- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+set local work_mem = '4MB';
+set local hash_mem_multiplier = 1.0;
+set local parallel_leader_participation = off;
+select * from hash_join_batches(
+$$
+  select count(*) from simple r join simple s using (id);
+$$);
+rollback to settings;
+
+-- Exercise rescans.  We'll turn off parallel_leader_participation so
+-- that we can check that instrumentation comes back correctly.
+
+create table join_foo as select generate_series(1, 3) as id, 'xxxxx'::text as t;
+alter table join_foo set (parallel_workers = 0);
+create table join_bar as select generate_series(1, 10000) as id, 'xxxxx'::text as t;
+alter table join_bar set (parallel_workers = 2);
+
+-- multi-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-oblivious
+savepoint settings;
+set enable_parallel_hash = off;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- multi-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '64kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- single-batch with rescan, parallel-aware
+savepoint settings;
+set enable_parallel_hash = on;
+set parallel_leader_participation = off;
+set min_parallel_table_scan_size = 0;
+set parallel_setup_cost = 0;
+set parallel_tuple_cost = 0;
+set max_parallel_workers_per_gather = 2;
+set enable_material = off;
+set enable_mergejoin = off;
+set work_mem = '4MB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select count(*) from join_foo
+  left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+  on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select count(*) from join_foo
+    left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
+    on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
+$$);
+rollback to settings;
+
+-- A full outer join where every record is matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s using (id);
+select  count(*) from simple r full outer join simple s using (id);
+rollback to settings;
+
+-- A full outer join where every record is not matched.
+
+-- non-parallel
+savepoint settings;
+set local max_parallel_workers_per_gather = 0;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism not possible with parallel-oblivious full hash join
+savepoint settings;
+set enable_parallel_hash = off;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+-- parallelism is possible with parallel-aware full hash join
+savepoint settings;
+set local max_parallel_workers_per_gather = 2;
+explain (costs off)
+     select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
+rollback to settings;
+
+
+-- exercise special code paths for huge tuples (note use of non-strict
+-- expression and left join required to get the detoasted tuple into
+-- the hash table)
+
+-- parallel with parallel-aware hash join (hits ExecParallelHashLoadTuple and
+-- sts_puttuple oversized tuple cases because it's multi-batch)
+savepoint settings;
+set max_parallel_workers_per_gather = 2;
+set enable_parallel_hash = on;
+set work_mem = '128kB';
+set hash_mem_multiplier = 1.0;
+explain (costs off)
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select length(max(s.t))
+from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+select final > 1 as multibatch
+  from hash_join_batches(
+$$
+  select length(max(s.t))
+  from wide left join (select id, coalesce(t, '') || '' as t from wide) s using (id);
+$$);
+rollback to settings;
+
+
+-- Hash join reuses the HOT status bit to indicate match status. This can only
+-- be guaranteed to produce correct results if all the hash join tuple match
+-- bits are reset before reuse. This is done upon loading them into the
+-- hashtable.
+SAVEPOINT settings;
+SET enable_parallel_hash = on;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+CREATE TABLE hjtest_matchbits_t1(id int);
+CREATE TABLE hjtest_matchbits_t2(id int);
+INSERT INTO hjtest_matchbits_t1 VALUES (1);
+INSERT INTO hjtest_matchbits_t2 VALUES (2);
+-- Update should create a HOT tuple. If this status bit isn't cleared, we won't
+-- correctly emit the NULL-extended unmatching tuple in full hash join.
+UPDATE hjtest_matchbits_t2 set id = 2;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id
+  ORDER BY t1.id;
+-- Test serial full hash join.
+-- Resetting parallel_setup_cost should force a serial plan.
+-- Just to be safe, however, set enable_parallel_hash to off, as parallel full
+-- hash joins are only supported with shared hashtables.
+RESET parallel_setup_cost;
+SET enable_parallel_hash = off;
+SELECT * FROM hjtest_matchbits_t1 t1 FULL JOIN hjtest_matchbits_t2 t2 ON t1.id = t2.id;
+ROLLBACK TO settings;
+
+rollback;
+
+-- Verify that hash key expressions reference the correct
+-- nodes. Hashjoin's hashkeys need to reference its outer plan, Hash's
+-- need to reference Hash's outer plan (which is below HashJoin's
+-- inner plan). It's not trivial to verify that the references are
+-- correct (we don't display the hashkeys themselves), but if the
+-- hashkeys contain subplan references, those will be displayed. Force
+-- subplans to appear just about everywhere.
+--
+-- Bug report:
+-- https://www.postgresql.org/message-id/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR%2BteQ_8tEXU8mxg%40mail.gmail.com
+--
+BEGIN;
+SET LOCAL enable_sort = OFF; -- avoid mergejoins
+SET LOCAL from_collapse_limit = 1; -- allows easy changing of join order
+
+CREATE TABLE hjtest_1 (a text, b int, id int, c bool);
+CREATE TABLE hjtest_2 (a bool, id int, b text, c int);
+
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 2, 1, false); -- matches
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 2, false); -- fails id join condition
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 20, 1, false); -- fails < 50
+INSERT INTO hjtest_1(a, b, id, c) VALUES ('text', 1, 1, false); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 2); -- matches
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 3, 'another', 7); -- fails id join condition
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 90);  -- fails < 55
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'another', 3); -- fails (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+INSERT INTO hjtest_2(a, id, b, c) VALUES (true, 1, 'text', 1); --  fails hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_1, hjtest_2
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+SELECT hjtest_1.a a1, hjtest_2.a a2,hjtest_1.tableoid::regclass t1, hjtest_2.tableoid::regclass t2
+FROM hjtest_2, hjtest_1
+WHERE
+    hjtest_1.id = (SELECT 1 WHERE hjtest_2.id = 1)
+    AND (SELECT hjtest_1.b * 5) = (SELECT hjtest_2.c*5)
+    AND (SELECT hjtest_1.b * 5) < 50
+    AND (SELECT hjtest_2.c * 5) < 55
+    AND hjtest_1.a <> hjtest_2.b;
+
+ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;
-- 
2.51.0

compress-tpch-size.pdfapplication/pdf; name=compress-tpch-size.pdfDownload

%PDF-1.4
% ����
3
0
obj
<<
/Type
/Catalog
/Names
<<
>>
/PageLabels
<<
/Nums
[
0
<<
/S
/D
/St
1
>>
]
>>
/Outlines
2
0
R
/Pages
1
0
R
>>
endobj
4
0
obj
<<
/Creator
(��Google Sheets)
/Title
(��tpch)
>>
endobj
5
0
obj
<<
/Type
/Page
/Parent
1
0
R
/MediaBox
[
0
0
612
792
]
/Contents
6
0
R
/Resources
7
0
R
/Annots
9
0
R
/Group
<<
/S
/Transparency
/CS
/DeviceRGB
>>
>>
endobj
6
0
obj
<<
/Filter
/FlateDecode
/Length
8
0
R
>>
stream
x������:�l�~���%	 �+������;��g��lv
������]eUGK�$
�+����������c=����������g�w\��~?��{^��������F�������m?���������!��s�����������_�zZ)F��?����{�����+�������~�ao�z�R��g����9��W`���"���/���6~?�����7�/��]����1�����)�z�H�(s���oR��|<t����
c��<���.D�l�&;���`�C�0�9vd��]�t1��M��+���na�E��1��;�
�w������0�yg��6�b����|8��}��o|����������n
6|�=�:�����&�nx�{f��E�@����o���|<v�<�`
����|8��]�t1��M�����<��=N;k�!��]�����0���������59�-�c������;�e�8�&g�L�
�d������<����4��7�>�V3��L������+���n���K�\���w��r&XX��-����;d�8�.g�L�
�d�WwG������q�[a��o�d_��^��X�e���|+^���`�w���or&�����X����)g�����K�;���	2{�)g�L�
�d���x�����5�^���|<v!��`�7��p>����������-��:�u��tk����q���9����h��t��[���|g2>����1l>G��mr���H�
�d���x��q����7���~��.������g���&����w���}lflr��&;���`^r	�������~��.�����+��y�G�vb����s��n��~��.��������<�{���!�y����2{�����e�.{N�Z]X�~Kq:��j5���"{�-���&�:����g��M�-b����y��`a�3���w<�;d�8�C��.��/��Z]�����w�����Of_����X�d���|+v9�8���������rbq���	�m��?�z�������r
����&P��x�7S�Q}|q��U��~||{3(�x5�W�
���"����,�J�����w��bI��Y�=����?�xc�����L�7b�:E���o�����\�W������q��wo�<�UL�:��1���rT_up;�U8��]D���FJ�w~3���G|�e�����[M�8�����v~3���G<�1y�����S<�:xt=j�7S�Q}|q��S���o����q��H���o��3�
�3m�>Z��c�3��{�0E�Ys��c�~7����/����v����wS������R���7N��;sM��3EZu�]����wwN�G�~�~W?��rT_u��_���~����FJ�w~3���G��mr=�`��*0���g���n���v��|1M��>���oO������z��MO{�����)p���3%le�wS���4�)�����}{�)���S�=�=����?�xc�=�=��x#=
V��SO{>�=
�CO �IQSO��@r��!��@2%�;��B�����^|�z�p��w�S<�zq������]����L�N������s������z��o�������z��o���t�_)����'����K��<t������N�G
����)��>�8���������U]��Q�Oov�~p~7���G��������yN�8���'���B������������zj�}QG����f
����5m�R�[�c�3��{�0E��r
��+���B^��o~/�+���o���z�bq�1��I��7j���2��o�o�~��~��C���f
9��/�:�����~%���>�+��o�����������z���-���q��H���o����7]�/�)�C,�'o����O=
.�7��z�{���i0��Q�z����f
�����L<w=�9c��7��i0�O=�=����?�xc��S=�=��xc��+��|c����?}��IHm����\ o��m�^�o�>���w��y����%4v�Kv��:������������yom������|�����x��>���O��=Q���8��Wt������}�v~�����������Go���1��/.�5ymw����y��6w!=���2�y}�������p����~��7��v��^	]>3C�=��|����m���6�3��1��S�`C����>�O��x}�/�6��rJ|X�|v�#��)���\}�9>4���1�P�{��~*�S�m�K���?�C|�]���	1���h���������q�>�8
�1!>5���1�P&�^E�^����z��U~�?DL�OM~-P�!c~��?�z|N������8F�O�c��K��
u�o�/�A�0c�����s��������S��=���:!�-g�������1�@�q���������4
z�{Oy
e��y��^��*��|�����S�t_�7Fv����;�tSO����U�����u�?D�M��s�5�����;��Os���������`C����������w��i�5~�����������w����q����-���H?���������?�	b����__����d����:�����2�������l�/���r�����)d�	C$[V���l�������4d��n>����!�[��f��9���S�s��!��X��wa{��O�~
]�����+���k}���oNa��M��
����K�{�������o�Z8��t&�~�	s�5��&�����%��A�r���`�����$���}���`�7�!����:�	/DS^6|���xsq;+��,���b�!e���[��W�����"��z�!�G���z1�p�%��5�5?'\W_/#sB�&�!W}�k��Lx����W8�9����
�\/��C6����	q�-���^"fD�\��k���x�����_4��.tm���W����\�������u^����V4m����5�P��y��e�_��|K"9_`N��|�\��@���������i�� WnM[O�C3�i����k���'������>=m�h�~Q��k�!�_|�4�|�zhS���t
6l����>����y�__�9%>���9�`C��uA{��0�P��M���	��3;~_�
m��'������"�\Cd�5����B����������,�O^I������S�'�)�,�i������8�
�t1��o_P���2>L�e9����Te�&mYU�l(/�7��9�"
4Yot6����qy����v&EtY��|�uW�}�2��K�G	��PE^E�1��^��������,s���5~|�o��������f���P�'�o��p8��;�Y��[���e_z����v���������~��\��ui�V�Y�`��oR��t�)�����#�s_tI��6�K���O}
6������S��_�����(S��}��E��+���[6�]�����3������)b����C�%���~���
u��/9>�\!w� �|?t_�[~�{i�_6�������������)�u��=Q'����S�S���Tx
6���3�%'��&NA���m��/={���
��;@6�&���/9c�����4�����:����1�D���)b����3��k�d����S!^����������s�l��}��tm����Y�Q���:���������s�l(��}�C��}�����R6�����7d������}���|?t_j�{�	oyE1�P_�����"��z����?s_rB�D��n_���?r_��w��b��N�S���_�����e
6����c�%'�0��$gX�
����.��M��>2����r~����P�t<����>�O���"���`
6|��AS�s7m���r^�
�e�|�gC��s+�l���:�\��������g
6��?w_F�p��[���M6lr��3�%'�k�{��[�l(��}�\�K��s��lx]2���������jN�
e��/#;��g�����O6r��#�eh���F���h
6��~���|!���	�`C��G���6��]��k�l���}�?r�6�
?�5���*�w������ih~��~���6����g������Ok���N�O�����kb�U��]6����p_rJ|r�s�`C��g�KN?*���y�s
6�	���t�m���1��M��z��&+���J,������$~���������s�`����Xf��6Y�N^k������.���#��f�s�`�7/h��*{lm&���<l
6��c����E���!w����/j~���|��,�h���|?t_rB�B~mQ�����������,i��<���["K�y��p�������#���X���/�M�����0mk���������*6�����_������o7:h2z&l��L����v��6�M�5������L�/\����%Yv���G�?Q�lS��a}:�o������J��>Y�����)�d�����qt|�o��{�;B�Q��;.>c���f����#D��[�)c�3O�7���6�m���a���
�L����;*%�;������=j
p^��x������y�S�H����o^Z���$���U��[������,
j^�%��'����|7��K�����/-�"m���S_.c1�������y��������@�w~�GG������Hn�����6��Qi��n;��ny1�������*������#g��o�5r�<j�UR)��f�n
�
b��WN~Y{�e���m��)��d�Cz�rB�M�uIN����ov�@k`�%�a[���qh5n|kC^���=l���v
��0���M��.M]�dJ�w�)���p�%��PR���x�7/-�+u�t�V-�������� 7���S�}�b���f����&TM�M���J��y����0o~Jr3�v\��������SM��������"�p]����%�����iG���{[�y�M�
��K~��p,���)��r|����`!���H}e�M�[�K~�;�T������1����.n��~9��.U�
�i��i}������AN���E�y%��������k�����tr�J�w�^��s�8?�������/
'��k�f���<�������)�5����6����o��7\�����M?�W������emN�6]�:��"�J�����U��}��
���B�����em��x�7S��lC��6D�9s
�S��?]L�]�g����{hb�s����N�:����j{���f_#��]�g��!�)1����n�:��
�rc��|���:)�������<u�w>�l�eq����������/y��b"���"�Z��eJ�w~3��K��,����D2�;c��~���������-81������5����W�;�������]������_�y0���K�wCW����O)�;c��wo�7xN�{�b�c��o��}�"��$�����?����_�9������4����r^l����G��x�7/
����E�S�����f�|���ZL���1��}�s�y��s���1������R��N�bziu#�}��_�1��_z{+���[1�����x����DWY�v�������eN���7N��������c�)�~r
��_��tq��]�x�s�� V9�;����%��}��Hv����{�b�����@~���)|�Z�w�)~���'��'�y��m�w~���C��TN���b���)�
WSsq}@�m�w���=m���)����`3c��O��)�Q���������S��N9������^�)��.��Vp]��/���21�",��y��.G,���N��O}��Q{����c�;���t�)�']�\��y���SX�����F�k���W��=tqz����O]�>����#*���s���SW��y�P9v*�-O5%�;��K�����8�S�y0���K��S`pL�{k�x��}i���J�>u��[Uc����J9��Oi���ue�`�������yN���c�����v����*<���x����p����u��)s�S��}j��3�|�V~����g�-�������&��!c��P�Iog�o_�
n��}�Y�p��)�M����`�������W�P��0_��o������gmg������~�7����,��4����[���/������bN�b���xgy]���Shze1k��������m�v�/6)�yS�z�1�fd��e�+:]u3����PMzo:��/����~�]	5����
g�6���0��f]/�����|)K���u3���e���e��s��S_�;��_o�B�����k�f��q����Cf���9m�U��r�w�i��E�bT��&���a��N��^-��aj�T�^��41����3���s���^��,�l���{��,�'5��W���2o�6n�\c5b��s���?�}����/6�	�aL��U&'�xg��[���������N{�k�xg}����i��T�B9-x���u�W��k'�u�x�������j��Y0�Y�����������k8g�Sg���0��f�����������s����l������������������y��+��V����qm���k?��l�{KFP���?x����6���[�����+���8o�������� ����H��M���q�+/����X��U�qM�G���kn
�+��1�R[���5X ��>�����~�r��5X �q��+�h}��?��ifF���?6P;h�H�\�b���|���X(�ds\�
���m��`���s�g�n�k��(�]�?��g$6�}
�����;��Wd#�(������G>zE^$�VZ��~�S���~4H6
Wi�
hh=zEN���(
��~?��:�l0f��!l6[���H��4t���y�y���G��SO�sO�����,�KC�

aW��=*I6�]�6h{Z���+R`+
]4t��:\�������#�";�F�,
]�&����,�KC�

ag����H6�]�4���j���N`�JC�AC�AC�qk
h��~AC�m��AZ���P�����k�`�A�a3��&4��c^��,
���9�)���KC�AC����:Y�Zih\�v��m6�`��4��V��G��I�Q��1�!��QX�oi�1KCkO��|�V$��4t���}BC�qa
8KCw�������
$�^�oh[=T��k��}���
�
Bk���JC�����@5v��Gih��z�?;�I6
gihvh����������y�z����b�7��V{�G=\����6ZmG���k�Uk/��B���k��k���q���Q��3|�>wP;��#�'����+@i�%�h���+�5W���3�[.��kJ����+o�gn��J�����o�;n�]J�����58�?�;�Ni���cO�����ac����������C��;�����:��m�d�N(uR�������p���%��50U�����4"\i#��#U�����'��� ��3U�F��Z"@�$����k=U���G�R'��;U�n��~��kB�������T�y����F�;�T~����#@� ��g��<Su�
8�Y�NR��=�G=\��uf4Hu�L��g�[��y(��Tw��:����ac����������~�J�Tw]���J����t������o�?N��� �]3Uw�T~����I�[?����+�����9����Ru��b(5R���V���|8XgF��~�Mn���T�~�?�W��:3�Hu����p���3��T�?��Wxg�����Su�Y�V�r��N�w�?=�>4�:S7��=���uf����_e���+�d��:���|�����R:Iu����0n��M.�N�����[��+@��D����QWxc����e��~����n^�:�s[����T7G������:3�N��U<�����W<����9p$~�Y��s����J�� ����7�c`��Q����ly+:��"���Z���t�����L�(y�Q����9~<��7>G*�Mk��'��u��9R)oac��y���-��#���6�N���y�mR�R����6���y�W)ovc��7�T����lO�[����e����5�T����y&��h��qk�	y���"��&9�����0�����v�3n�3e��l�T�����y+��R=��:S���`��z�
v�:�����V;�z����y/:Y�F��7��2On��=)o�c���;�9�4��T��g��gB��_�Ju�cY�����`�M����L��d�g��g�<�lt��qC�)���`��z��}���X�Ju�cY��7y��1+�hS=��?S��X�FM�#�)�`w��������O��h�*Ci�:c�Bc��s_�W�R��z������
6�U�0�2��
6j�g�	L�g���.�3���3Y�FC���)Wt;+X�L�e=����2d��g�L�����N�3L	���YmOJ{cY����'����H�2+0��<7�sn���(m�g�L��d��AV���1��KgrQ����&=�(��0e�����������,�s?��,�$G�:�v�]x���q��8��Jv�������$��I��I���Y2���J��}l_a�-9��d�����\�����p.��uak��po������Ug�����<��V�!����]~���dS�9�]����d�.`XN�
���H�q�
��#X�.��c�
��#X�n��c�
��#��r��mX�O�
���"y��*n�`%;E���U�!�J�E���U�"�Jv���U|#�L��-���r�r������!��fOI���"y��*�`%�"y��*~�`%�"y�a�I0���<+��|�.P[-l"y��*��`%�D���*>�`%"y��*��`%�"y�aK0��.'+����.P[-l"y��*��`%�D���*n�`%"y��*��`&�M+V�a�`9\*60o�o_�f_L���"y��*.�`%�"y��*��`%�E���*�`&�7��ayi9\*���Xx��7��k����;V��+Y�;�Vp�+�-�w�V��3Y�U�h�r�r�T������]�}����p����]x���U;�J6D���U�;�J6E�������g<���&���'���� ��)D ��xP�8�,�E	����3���Q�@:y1��q�X�n�)��>��c�4��i������y�$tv�TJ'c���CP��N���'N
A���J��a��<qV��&E*���m"���4�R:y+�m�_xw��T�p��2�`�����,d�sA��k4T�p������T*'cY����+�hW=��c�<+������1e��
6�T�p��2�d
�3�<�:�������Y��|�gxw�G�P=��c�<]N�J��NS���D��N���7�	0���~L�Eq�\�����\��E0�(���d�<'����E;�)�t.�S�v S������@��&�����X.��qQ��W���E;�)�\\FM�v S�\F��@��3e���v ��je��a�H��1��<v�3������a2e��l�T���2�`]�g��L�g�����v S.wV�R���z�w�3@�/�v�3�@���Y�F��v S��Y�F]�;�	y`�H��1��<7�3������a2e��lt����1e��
6:U�p��2��
6��g8yL�x�u���r�r����M�����p��8�<�Jv�������$��I��I���Y��K��K��K���-�<$Y��_�.�
������4���I2��Y�vJ�&W���m���N�`%r��x��'�<�J6���q�5(�<�L������p��c�
�u�d��U�<�Jv���U�<�Jv���U�<�L�.�
����R0���H�q�
��'X�N���a'O��u���i'O���"y�n'O0������r��������u_WN�7t6��oRoVK�+�)���8�Sj�<��`"���Z�'(X�����Z�'(��`F�ay�9\j�����M��1=�.�t5�%���6lVK�+���q����	
V�)��8���	
f2��mX�`�
��TL@$3���	
V�K�e6�x���l��/�x���&`D�'���`����fOP���
��+LO0�K�S*`t���iOP���"y�nOP0���H��<�.poR�!�w���	
V�S$�xX��d]$�xZ��d�H��[��ds��'����.0�H�q�
�'(X������'����%0�H���
�'(X��H�1=���0���G��Q�n^�B�O���N!�	���R�9dA?(J =��(E�9������'������+L���� S�9.N��� �v�'���s�Rz�;8O����H��1��<qj�o�TJOc'����6)R�<Am�m���p���X�oc���H���'��y��R�d�<��k�� �����-��"���X�3����
6�U���2��
6j�gx�L�g���.�3<A��3Y�FC�O����`��q�c=��.`�Q:T���2O���i��.���z��J]�;�	y`��H���0��|m�gx�GiS=��c�<'+��P=��c�<�lt��a�1e��l�U�����Ktc+���������a���XhS=��c�<+����a�1e��
6�T����2��U��P=��cBx+R��<�e=���/�(��g�yL��b5�3�<��3X�F��vS���`��z����+���Tv������KC�]�;�)�tV���z����ynV�QW=��cBx+R��<�e=��:X�F��vS�9Y�F��vS���`�S�;�)���`��z�����^�wX�����e	�m����\��-�a�	V�S�����d�&��M��M�
����]��]��]�
�n��!�&�<���6,;/������&������$kr�����.a�	V�!W���]9���dS.�]>���d��p��Xv^�
8v��wHf�[���d�H��Y���d�H��[���d��"����.�x��7��y����;������)����U�<�Jv���U�<�L�7:����ze�
���d��UL8�Jv����p�����O��p��l���UL8�L�-�
����R�.�
�w��p����;�	����K*��
/��p��l���UL8�L�-�
����R}�
���d��UL8�Jv��C�fN��
���eN0��u�h�2�r�T����!�w��`�	V�S$�xX���dy���
����R�K�[$���`�	f�{��	����M*�>D���UL8�Jv����p�����O��p����;v��p��l�"y�2�r�T���f�;�V0�+�%�wlV0�+Y�s?��4�r�T�Rs����;_�)?����x�w�W>����N!i��~}�{�1�)�W�F5�����!
����6����V��U9�6����$n���"����3�v��$���XT	
J��v?�$q������,�2lfKI�N�����$��qG���$�[��O�0n�d��NI�H��{�ftP�t�*��3"��=�p�Za�E�	�����m�9�b�[�H���"��[�����,��H�e�-�X�gfD�-?l�$��9�0#Rl�a/<wR,l3����O����$��3�0#Rl�`;)�Y��b�/��I�p�"��[�j�;Hl��V���\��'�
�V�����Z�Il0�"���V~����g+��HleV�/���[+���Hl�T�w�
�����/��M��"����0#[yT�
nV�����Z8�{|�w��O����k�9�������+��H��K-�H�p�"��[���A��}aF��r�NR,��3"����<;)�U���H��E����
k-!��{eD-��XXVfD�-j�M��_aF�������)f�
s"����� ����0#R�o-y�baSE�)����7�
�R����,�e���GKX;�
����������������&����j���<���q��I�����������.��g���0�>�2���D���[�L�I�����^�_�DB�#_�)<��B������WG����He->v�8�#\�9���
n�qGud��|�����"�YAr��o�8,#�:���Z�6�8��} k�����5�����2�O�8�"�:����	��q�Dud���|���!��5�h�������r`S��'+�BY�e�wV&���-�(����������PGVfY3����	'B��,_&��2a� ���Y�L������PGVf92���	������c�'+�
BY���,^V��P�>��YFLp�TM�I9�=��g�e�|U�#k�,����_�PC�S���,^���P�>�5^�K��yo�}���r^�O�8l�:���v	��qx4ud���|��a� ��5^���{c��������m	>X��f��/�%�d���A�#k�|��.wb��|u���]5>wV&L�} +�L�������PGVf9"�+�	BY�e�V&��:�2�	��L'ude�����u	�2F���x#����H�*����
HW%�5��*�(�lm���R��_k�%��L<)��������������y	����(m�y���)��`Y�Ji�`��<(�E�XV�R:5{s�"�,+R)����'\�mR�R�5�&��8O�*�e��"���M�eE*5�0Z��2�`�]�c�61e���5*f�61!O/�H��o0��_&����v�3Z��2��
6j�g�61e80+R)}�e=��	��`��zFkS��X�J�����z>�sZ1+���zFkS��rBV:������Y�i��H��zFk���iE*������kc=����(m�g�61e��lt������y:+��T=���)���Y�Jia,���Y����o=�r�0���7�sZ=�m,�������y.V�QS=���)�V���zFkS����P���������}^�Jea,�y���t~V�]���&��s������ML�g���.�3Z��2,���F����d=��`B�%a,���Y�i��PhW=���)�tV���zFkS��Y�F]���&&�IG��T������zNSh�Q�T�hmb�<'+��P=���)�tV���zFkS��Y�F]���&�\���Kx�r�r�,��M���2E.����hm�d����i��M�]�$�D^�%�$�$��D����C�M�y9N+�����ZOK
��$3��%k�$kr���SD6�hDk�`%r��x��#Z�+��m8�%�C��I��i@���byY9\*������d��UZ�+�%�wlVhm�d�H��[��I0����
����Rm�
H?*�nVhm�d�H���
@k�`%�"y��*�M����;v��6	f����F�b�\9\*�����C$��NW�
8����"���
@k�`%�"y��*�M��l���UZ�3Y�^+����.p�R�{E2��*�M����;6��6	V�!�w����$X��H�qX��I0����
�N��R=��a�E2�t�r��HlR��E�a�
@k�`%"y��*�M��,��mX�Z�
�T@Zb��p�
@k�`%;E���UZ�+Y�;�Vhm�d�H��[��I0��=��
�i��R�&�Y$3����$X���z�
�o��R�)�>YD�Vhm�d�H��[��I0�����
��)�K�]* ��Hf�[��I��]"y�f��&�J6D���UZ�+��;��w���>����x�w����G�	G�����B������s�����������R��cA��]Qi����R��bl��t���)����H��&�v�'�����Ji�a��<q~����&���'N
Aq#7"������y���M���S���c�<M���&��|��[D*5�0L8��3X�F��&S���^��b�	��<��V�R�p�z���X�F��&S��X�FM��)�V���z�	��y&+�h��a�1����T&�z�gXkA'+�(M8�e=�Z�rBV:������Y���"R���a�1!l��T&���a��`�M��)���`�C��)�tV���z�	��ynV�QW=��c�/���T&���a����l���a�1e��l�&���a�
V���z�	��y��2���3�,4&����T*cY����+�hW=��c�<+����a�1e��
6�T�0��2�d
�3L8�\���`�2�0��k
+CV���z�	��y:+�(M8�e=�Z�Y�F]��	y`��H�2�0��k-�`m�g�pL��d�g�pL�����N�3L8��s�����&S�{��a	/X&\��%4|5�!�f����0�+�)kg�cZ�k�d8#�M�
����]��]��]�
�n��!�&�<]��6,.����W�&��f��)��\0�uC�a��F�p��l����eW�0�+���E�a��0�3\7\{*�	������H�q�
�	'X�.��c�
�	'X�n��c�p9\*�Z \��6,.�K�W"y��*&�`%;E���UL8�J�E���UL8�Jv���UL8�L�
7:����ze�
���d��UL8�Jv����p�����O��p��l������R����E�a�p9\*����w��p����;6��p��l��/��p��l���UL8�L�-�
����R���M�"�V0�+��7
�UL8�J6D���UL8�L�-�
����R����H�q�
�	'X�N���aN��u���iN���"y�nN0��u�h�2�r�T|5�!�w��`�	V�S$�xX���d]$�xZ���d�H��[���dp�"��L�._
�D���UL8�Jv���UL8�J6D���UL8�J6E��i�����3��Gq���X�#��#@��S�@u��������#x�K�B4\ J�����Bl3��8���P�b4[e�Q�<����Di48O5�-F�QF��������(#�&��&���rRxmP�P������f��4�9OS�a�A��4:8O5�-,cxmi�J�&�����-"�X����blB��H'�35�-n�gxmi�z�&���^[D����m�`=�k�H#�35�-��gxmi�z�&��F���
g=���LMl�O�3���4b=S���z���F�gjb[|����E�����^��	���:����'�YD���lqg�#�H#�!5�-�Y���"��uH�g��lc�#��<#�!5��'��G�����_�Cxdi�:�����u�,"�X��|�x���E�����[2!�����|�=Di�z�����YD����l�`=�#�H#�35�-��gxdi�z������Y���b�e�z���X�����aQ't����lqg=�#�H#�35�-�Y���"���L�g/���L�tb=S������,"�X��|�
Di�z����7��VD��e,���,X�ma��(kVj�'Y���B��,\�e,�8�$;�D^�$�6K66I66�v�%��-��%��%���dS�
f�Y��2O�&�Z�h�S����]�$�N�v��4j�'�\���B��\�Q�X<1�j
�����Z����P����e,�hR0��(@-c��%s+�n�T����T�-D;JP��z{!
f���e,�8�`n!�Q*�Z���S*����e,�@�cE;JP�X<qK��B��T���'�5�`��p�[�UvG�hG-uy�]D?*o�(-uy�SD?
��"Z��ZO`�D��~�������&���hG-uy���~�E���O-�(D;�h��+��"Z�Q�v�R��z�&
f�E���O4��������lkv������(@]^���
��h�[*�����AQ0�(@]^���}����(@]^��)?
��R��Ot��Q�v�
�.�x��1V��Tuy�'���`F?P*�����C*~����+�8��G!�Q*�����.?
��z��[`sE��~������&����hG-5f����E���O-,$D;�h�oN�F���(������������xO�5T�#a0 ��+D 
�xP
�^��%�]^1�Q
��`0������'�@�� ��W�R�TzQxOAp�V�M?ty1e�K+R)]*������TJ�
co���^wiE*�K�������M�T*�
B�Dv�i"\�t�0V��X��.�H��F�S�,\�Ku�./��3Y�FC��./&�Iw��T.����=5V���zF�S����"����X�3����
6�T���b�<�l4T���b��+X�\*��X���zNwi�Q:T���b�<]N�J������;�9����U���bB�u�H�r�0��|m��t�V�M��./��s����3���2\���n�����zN�jE*u�3����Ktc+������������oc�M��./��s�����]^L�g���.�3���2��U���u����d=�K��T*�cY�cg=�K��(��gty1e��l�T���b�<p�V�R�]�z���.��T�gty1��ng+������{g=�KKC�]��./���Y�F��]^L��fu�3����']�W�R�]�z��9]��GiS=���)���`�C��./���Y�F��]^L�n��T�./�e=�[���u�^�|�-K�m�5tZ]��p�e4��+�)kg�cZ�k�d�&��&���f��.��.��.��c�d��d�u^V��6,�,����
�&��f��)��\0�����.��%X��%���e�F�dy�^8�D�o����q�Riu���b�f9\*������d��U��+�%�wlV���d�H��[��K0�����
�7��Rm�
H�+�nV���d�H���
@��`%�"y��*]^����;v�ty	f����F�b�f9\��lRh�B2��*]^����;V���d]$�xZ��K��M����
@��`&K�lE�	����]* }�Hf�[��K��]"y�f�./�J6D���U��+��;�ty	f���V�a�p9\*��R��E2��*]^����;6�ty	V�!�w����%���7[��e��p���I�o�7�ty	V�S$�xX��K��u���i�./�Jv���U��3Y�f+��L�.p�Y~a�P���4�r�T�}H�o���U��+Y�;�V���d�H��[��K0��	��
��+�K�]* M�Hf�[��K��]"y�f�./�J6D���U��+��;��7^�g<�����x�@�#��#@��S�@�p1��qY��H.6Jg�w�(�4�b�I)�|���
S��}�	��i���(�	�����I"����&�����>8R)M8��9O����#������y���M�T*B�Dv�i"\�4�0V��X���"R���a�1e���5�T�0��2�d�
3L8&����"����X�3����
6�U�0��2��
6j�g�pL�g���.�3L8��3Y�FC�����`�2�p�c=���[�Q:T�0��2O���i��.���z���J]��	y`��H�2�0��|m�g�n�GiS=��c�<'+��P=��c�<�lt��a�1e��l�U�0���Ktc+�	��������e�
6J_����X���"R���a�1e��
6�T�0��2��U���u����d=�v[�Je�a,�y��g�n�GiW=��c�<+����a�1e��
6�T�0��2�d
�3L8�\���`�2�0��|�9x�Y�Fi�a=�z�O�3l��T:U�0��2��
6��g�pL��mE*�	�������a�E�M��)���`�C��)�tV���z�	��ynV�QW=��c�u/\7,���������[r�lv�nKvLI�'����h�4�0��$��I��I���Y��K��K��K���-�<$�d�����e��p�z�j��$���,Y;%Y�F�n�6lv�N��
�Rt���&�`%�r��8��&�`&���kO�2�r�T��K�uC2��*&�`%�D��i�ax�
8�T\7Dv��p���[D�	����&�
�7��p����;V0�+Y�;�V0�+�-�w�V0�3\7��P,.���M*��nV0�+�)�w<�`�	V�.�wL��T�9���!�pX���dp�"��L�.p�Rp���p�
�	'X�.��c�
�	'X��H���
�	'X��H�qX���dp�"��L�.�w��nHf�[���d�H��Y���dC$��&��R��
����e��p���I�C$��Y���d�H���
�	'X��H���
�	'X�n��c�
�	'���M$oX&\�
�7����;nV0�+�)�w<�`�	V�.�w<�`�	V���~��*&�`&��H��L�.0w���D���UL8�Jv���UL8�J6D���UL8�J6E��i�������G��Q���_�g82L8��;�T'�ke���T@�`����h�(@��I)�����j�	qN��&��h� �H��y��m1�-�2�hp�jb[�f��4�9O�5�-�2�hr�jb�<'����	el����`4[e���y��[-2e���y��m�`�k�H#V25�-F�QF�������j�	�N�gjb[�X���"���LMl�/�3���4b=S���z���F�gjb[<Y���"���LMl��,�^�zB������gxmi�z�&�����-"�X�����f=�k�H#�35��[-2!��uH�g�O�!<��4bR����:�G�F�Cj>[|���E������7��:�G_yF�Cj>�/N�!<2|�
5�!5�-�X���"��uH�g��YD���l�����%�����[-2!�����|������E�����_�gxdi�z�������,"�X��|�x����E�������jg=�#�������b�z�G�E���z�������,"�X��|��f=�#�H#�35��[-2!���L�g��3<��4b=S������,"�X��|��f���H#�!���zz�+�-��e�J-c��!�V�[�v��+�����d��h�k�d�f��&��&���$�%��$��D;�l�@s�E��~ ��[����$Yk�xJ�f�h�jQ���2�Z���!j0��(�j�2OL�Z���hG�`�����Z��FG�j�'�T�-D;JP�X<qI����[�.@-c��-s��R�2���V���@�j�'��[�v�
���x���(@-c�D�
���hG�j�'n��[�v�
�����Z�[�z�b�
�V���(��.�x��h�G���!��.�xb�h�G!�QDK]^�	l�(���R�W<�D����(��.�x����B��������!���hG-uy�SD?
��"Z��ZO`�E��~��������w���\s��m�N��jQ�����+�R��mxKP��z[-
f����+��?��~����+�8��G!�Q*����	���X��R��O�R���(@]^�	l�(���
�.�x��
��hG����'N��Q�v�
�.�x��
������s�
�V���@-5f�MD	��"Zj��'.-,$D;�h�1+�"ZXH�v������+�QD[[-�����<x�+�
����"��q� @��S�@L1��qX���.��(E����Q�@zT1��Q�`/�0�t�0��,������(m�y���)���#������y�����H�t�0��<Q�A���J�Ra��<�=Ao�"�����6���y�W)]*��6�/���Tj*aty1e���5�T���b�<��k4T���bB�K+R�\*�e=�{
j�`�t�0���)�b5�3���2�`]�gty1e��
6�gty1����T.�z���`=�]�<J��]^L���	Y��Sr�srg=�]�H��zF����kE*�K����kc=���<J��]^L��d�������d=���H�S��./��s�����]^L�%�������X�s�X�p��m,�������y.V�QS=���)�V���zF�S����P�������g�T+R��.�e=��
j�`�]��./��s������X���X�p�"R�R=���)�LV��P=���)Wt;+X��.�e=�;�.��B��]^L�����N�3���2��
6��gty1!\��Tn�����z�Ky�6�3���2��
6:T���b�<�l�n���gg=���H��zF�S�{�ua	/X�Y��%���Vrn��F��`%;e��xLKvm���$��$���,��%��%��%�p��l�l�����h���r8k=�0`k��po���������+F�vZ���dy�^8.�6L�,����>��V�
�]>��K0������b�f9\*���`u!��n�./�Jv���U��+�-�w�V���d��"��|�.�6�X]Hf�Y��K���"y��*]^�����O�ty	V�[$���Y�
h��=����b�f9\��lRh�B2��*]^����;V���d]$�xZ��K��M����
@��`&�o��e��p��k�
�o�d��U��+�%�wlV���dC$�xY��K��M��c�p9\*��R��"��L�.�w��fHf�[��K��]z���Y��K��
���e�./�L�,�
����Rc�
�H�q�
@��`%;E���U��+Y�;�V���d�H��[��K0��y/������R�&p"y��*]^����;V���d]$�xZ��K���"y�n�./�L6w��auy�p���K�&�w����%X�.��c�
@��`%"y��*]^��l�������p�<��W<������G��+@<��B��[J�s�p��%P&��(�/�}�W�@�pk�I)p�xA�n�0�[��4��2
n��4Je����yp�X�[�+R�L�;8��p�wE*�	co��S�-��T&\���g����H%2�Bh��n�<M��T&\��6�o�n+R�����#�<��kt����#�<��k4T�i�!O�n�H%2�b,�9M�E�l�����#�<+������#�<�lt����#�<���"�������4����
V".�z��4���`�C��&Q��rBV:������Y�i��H��zN�y�v{E*�	cY�i�-:X�F��9M8��s����s�pD�����N�s�pD��fu�s�pD�%����[O�L��zN.��Y�F��9M8��s�����9M8��3X�F��9M8��3e��4l�1e�1Y�i��"�������4�5V���zN�(�\�`��zN�(�V���zN�(�LV��P=�	G�+:��]0!2�b,�9M�X���v�s�pD�����N�s�pD��fu�s�pD����+R�L��zNn��
6�T�i�e��lt����#�<�lt����#�<7+������#�uo�n��$�Y�e��~l�2,�e�&\�)kg�cZ�k�d�&��&���f��.��.��.��c�d��d�u^���6$�Y�e��&��f��)��\0�����.��c�dC�/�rL���M�\tv��&c&K�-�=���p��4�q?6��	��Ri�^"y�f�&c%�E���* M8�L����6$����<D���U@�p����;Vi�1V�.�w<���c�d�H��[�	����u��d�a�T@�p��H�q�
H����~lD�	��Ri�v���i�&c%�"y�a�&c&K�mE�	��Ri�6���n�&c%�D���* M8�J6D���U@�p��l���U@�p��,]�mH&�K�	�D���U@�p���[D�	��Ri����e�&c&K�mE�	��Ri�"y��* M8�Jv����4�+Y�;�Vi�1V�[$�����c�d���hC2�0\* M��C$��Y�	�X�N���a�&c%�[��+,��~z�
H�-�
�U@�p��,]�mH&�K�	�D���U@�p����;6��4�+��;^Vi�1V�)�wLn�^������/<�[��i{�����I������[�3��_�i����_r��}_�3:)I�i��"S>��D���}_�
u��)�����T���3���~�*���L���7g�o	u��)}5��9)<���b�����w��*�q�LM��m�*���L���6�8�:���_��oxqud�����12e�Y���7�8�8�:���e�X�p���/�-x����!��5^N[�d���C�#k��6�>��/�gF��5^n[���3�PG�x�m��5o����r��o�8�9�:���s[���2����r��OV&�5�:�2�/��Lxkude�c|�2��!���Y�Y|�m�L�k��tde�k�/[V&���Ul�����/V&<6�:�2�9�L�lude�w<ea1u-�(k������L�@>���X�����/-�b��mC�#k�<������PG�x�h��5�
������ba�����a	��/'
kA�8\�\*������;k�BY�����q8oud����[82e�Y����q�oud���|����!��5^�Z����s�PGVf�aX�o���w�������0<q�
�Y�?X�������t��p�k�t�f��&�����%��-��%��5�����*���Q�����%gO�&�Z������% 6v���e`�cxb�� ���\�?�'�\
�N���a9d��w��KU�G�'�T,��TE�dx�����VW��]��|2<qKU�V��KU�SO`�G�
�TEyex���������(�O�R0�2��R����.U{-�,UQ����*`�e���*�3�'��� �y��&��*��� �?Xd\V��"c8cu�x�����Sdo,�,2.�+������d�q^x�����e��E�ey��Kd�,�,2.�O�1<����l/<1E�p�2��"�2��	l)X�O���'���kv���z�����@
R���*���C�nY�;�Re���
R���,UQ��?&��Y�?X��L0<qJU�3��KU�
�'�T\��TEax����o����(+,������d��2���!U�,�,UQv�8�*��e���*��]o�w���`���*�9�`�?Yd\>�h"c�^�`�q9Yx����������Cd�+�,2�?��D
R��E��Md\��-<������e����#�$��aJi�a��yp�X��T:����s7�TJ�
cO��S����V�a�m9��T����%�7����&X��.��i���!�pN�C�
�u���$�!��%�0��>%Y�i)�������(7��&��J�0����o����"������&X��(��2���M��M���0���M0����
����R0��M$��[��M��]"y�f��7�J6D���U��+��;�t�	��U$oX�\�N��C* �HfxX�N��u=��~��z��R��E�a�
@W�`&C'\D�K����:���8%�a���p����
H�.�
�4�	V�.�w<��)'X�n��c�
@��`}�o"y���r�T@������@q�
@��`%�D���*�s��l��/���	V��k�������)���6,70�K�:������0
�.0�T@ym���V'X��H���
@o�`%�"y�a�;�Z��"y�2s�T��K������n�V;�J�E���U��+�-�w�Vh��d���h�2
s�T������d��U��+�yJ�a��9\*`�R��C��i�F<�Jv���U��k���!�I������6�H�	�7�@_�2e<���������M3�M38_�g�f�f��fp�g��f�Re!F��$+B�#]���4���<c;5c},�N���N�x���~�sz���|�53Z��)�����pF��reLGW���PV����k��������:��)��5���5�v>e�xk�<�{���O�2���������i�����������)��5���k-~������9m����i]k&�Fdp�^3h�S��i6�v�q��aw�6�4�eF��km����y��5��?e���F|z��P�2N���t*W��2#�sY��5s�Z3�f"���5�~@e�xi�<�y��)P�2�S38�GZZ3���I[�����+c:������������L�����{��QP�2^Z#n^3�T��Ck����Z�+c�����\�������8�F�y��yP�2�Z#>�f�A�L����O��*S��5���jEh��[k���F��D���{�����o^3�*T��������*S��5���k������yp��A��re����s5V�����ff�y��5�vCe�xi�<�y���P�2��_^3h<T�����/��������!��/�*I8�U�Lr��T�j�=8NF����J�����yp
Zt7�T*5���'�Ei��P�����W+�qI.��W1�K2�m�.����b��d8��!����b�-�p�	�o�6,��$����mH�*D��F7I�T����b�
����P�h�f�O���
Q��e�O���M���0�������P]����b�T@���M$��[���X�.��c�
H����!����b�T@���S$�8��_e�3�H���U�N��C* 
�HfxX���X����
O?�w=�w��4T#��[������P]����b�T��I���7���W+�)�w<��_e�d]$�xZ���X��!����b�T�uK�������b�T@���P��@q�
H���]"y�f��*c%"y��* �U�J6u�c8|�3u�3��P]����b�T����P�d��U@������;6���W+��;^V��2V�9$���U�
S* 
�X)*����R�.��*����U@�������O���W+�-�w�V��2f�4TW�!��.07��4T#��f��*c%;E���U@�������O���W+�-�w�V��2��=�T\�(���z)�mz-�~*:o~9���0e<���������M3�M38_�g�f�f��fp�g��f�R��Fg�W3B���Upk��yo�������r������r9�Ua�8����_3��*Lg���{F����1�T\y���Z3��5�~*2:�^3��
S�Kk���k&�Ua�xk�<�{���*\�O����f��L��f�OEF��k&�Ua�xj�<���IU�2v���^3��
S�[k���k&�U���~*n���Z3������7���W�)��5���k&�Ua���F|z���*L������L����1����L�jFh�\��L�����{���*L/��7���W������g�\��������to
������,��{5[�	2 �m@0��U���	�>1����'��SV#��a&q���js��U��xb�Tz�9��*h!��ZV+�[`�l�;r�yG����r�L;c���J�g�3���V�s�����WA���B�Z���sf0g�	s�����WA+��9R������q���bs��U��x`�Tz�9��*h!����_��3����0G*=��a�"����Z�Un�9s,�3��9R����������fs��U�B<G�����-0g�s��1G*=��a�"��#��m���
Zw��J�6g�_��'�H����(C���w�+��?�!*h���(�����*�W���:�R�Y�J��'g���R���Es��4��$����Y�-hy]Kim�0qE�Z^�RZ[�L\�r������VW�\��u-���'�����&&j��(����WpE�Z^�RZ[9l6]�^�
RZ[9L\�ra��/&���J���-N��b�^���P��u���[W2@���� &���J����+d���Wk+!�+Z.��b�^���P�h�pB��z��2@���nl�1QK9E9A(W�\X �D�Z[	�\�r������VB(W�\8 �D�Z[	�\���$��$tZed�tZ�=��VB�UFf��A���jm%tZed�tZ�=��VB�UFf�	����y��Z	�V���
�V|��K�r�N���ra�N+����J����,v���{^���N���r�����C+q�r�NK����+	e[��� ����J�ed�+d���Wk+!��Y.���{^���PFf�pB��y��2@�e�7B��y
��PF&�9!��,cP�d�22��
2@|�������,h5[Kim%d�22��JY�������,&��=��VB(#�\X ���Z[	���r��N+V���J���y�����m�*�<��8�V�#]�p�-^e!T���y������D�����u��}@�> ��#�������JOH<Mo����B��Gk����3�	�^�8�))�s�"T���y�+;NL��,�J��T9�t����x��Pi��*��\����B�1g��HWf��*���3�y�++��x�4�7z��Q�#]90g��,�Jc�(��\���B�1g��HW&��*���3�y�+��x��Pi��<��
sF��B�4��r���9#^e!TsF9��
���jmu�3�jZ��Pi���'�+�pqi���{��	���=\��B�4�p��+��Z�B�5�p���{�8��Pi���'�++�pq�����OHWv���,B���+�������Y,�JcW>a�B��E-�ZcW>!]����l�*+m�.g��@/�E����Q>!]�1g�Y,����+��Z�B�5���	�
�����B�4���	���9#�b!TsF��te��g�*�9�|B�r`���X���Q>a�Bo�E-�Zc�(���L�3�,B�1g�OHW�q���3�'�+��8��Pi���+�f[�B�5�p����{����Pi�����++�p�������GWv���B���WG6�[nQ+B���������}�����_i�y~|eT����RYF��������P)��6����9���Ys��\��uKT�7��.�Ce�R�^M���>�j{�EN+���-~SK�-�6RG��0*En;�6RG��0�E�6RG��	0��s�����q�N9`6cW7RG�8v�z<��������������[�����aj)�s��F��H�����%�$�Z�G�8d{�E����M�s�Z
l�.o�l2����R`;ty+W����;��[����0�T�����q�r
�L�l����~SK�mX��\l�� �
2�M�����d?��%��������q��C�d;�f�`2����R`ty+'�������[����0��]���d?��������T�%}2` �q�A��d?����V��V�&�9L-�C��r5��aj)������~SK���x�6R���q��}�`����M�s�Z
l�.o�l2����R`;ty+W����;��[����0���]�He_��!�2�G��M�s�Z
l�.o�b2����R`ty+7����c��jm��/����T���f�)�%}2�� �q,���L�s�Z
l�.o�b2����R`ty+7����2tg���$���
p*08`���Vv:��a�V����NK\$��	V��%�#��������	�'d�x��`�rC�d������V��-$�8Uf��V�v���a�V����^�������'��+���3?�	Z��]���h��r��i��a���V�6g�9L���b�Tz�9��a�V�s����~���,���-0g�s��K"Z=����0A+�2#�jqC����`��wI��3�&hE<0G*�����0A��KZ2Z�����6
�3��%�l��s��q���ds������H��3�&hE<1G*�����0A���B�Z���sf1g�
%����~�"��#��[`��+���D�z�9��a�V�s����~��
-����-0g�s��P"Z=����0A+��9R���?�	Zw��J�6g�9L�Bd7��V�*������}���`s������H�'�3�&hE3cr����sf�0g�s����~��s�j��r��c��9&��J6g�9L���`�Tz�9��a�V�
s����~�"�#��l��s���x��#V��0���9b��3�H�G�3�&hE\1G*=����0A+�� ��W�3�&hE<1G*-�������w�+��/��]T���U����*�W�g'��bt�m�-Q��Z>;k��K�n�J����Es��\�
�����	0��Z
kZ�e�����
`Tm�\6hm�����`Th��vhm�����`Tc�<hm�����`T^�s�����_�N9`6cW7R�U�8v�z<���������j)�z�������j)���������j�06T��F*�>��j�3ty+G���j)�u��F��J�`��������W��	]���d��Z��
]�H��R9�X&�6T���d��Z
l��o�b���u�`C��6r3����cC�jm��W������j�9�`UK�-����L������
Z)�*}2`� �P-���L���������T�*}2` �P�A��d��Z
l�.o�l2��U-�C��r5����;q�c�nG?'N�6T��F*�>���l������W��
]���d��Z
l��������!�2�
�����d��Z�t�.o��W������*
gQ�&�_�R`ty+���j)�������W�d�Wk#��J�8�6T���d��Z
l�.o�d2��U-�A��r1����;6hm����q����y��4'A��Un�S�a�����z���WA+��S�JO�%�������������sD���h����2C��B�Z����C�U���D�����3N��O%����.��
Z�	V��%��
Z�
	V���-p�<��2�����h��r��i��a?��V�6g�_��+�H�g�3���V�s�����WA���B�Z���sf0g�O%�����WA+��9R������q���bs��U��x`�Tz�9��*h!��J�AF+�[���s��T"Z=��a�".�#��l���
Z7��J/6g�_��'�H�w�3���"���`��W���:b���JD�G�3���V�s�����WA+��9R����������ns��U�Bd?��V�*����F��S�h�hs��U���b�Tz�9��*hE�1G*���a��O-����-0g�sf�0G*=��a�".�#��l���
Z7��J/6g�_���H�7�3����1`�X��Un�9s����'��J���-0g�	s�X0G*=��a�"n�#�^l���
Z��Jo6g�_-�s��Z���s�1g�s�����WA+��9R������q���js��U��xb�TZ���e�V��;��W�k>��9����+��f���z}v�.F��������z}v�.A��������z}v�.<%������	���J
kZ�e��W��7�q��$/�_���6cUI��R���H�����sI^j�Z�����`\^J�<uk#��Z:��+��`3vu#��Z>�~�/������tz�W��=����|�W��	����t�W�d����Fj�|2@��K����M�����V��V�&�_UR`;ty+W���*)�������W�T�����_-�2@��K.����L�����6,�F.��oX�7�1T��Fn&�_U�ab�>Z����q��W/9A��r0 ���[��[9�UI�m���\L�������Vn&�_UR~���Fj�|2@��2&�.o��W�@2@��K�����M�����v��V�&�_UR`'�}��������	 ������_-���3ty+G���*)�������W��]���d���J
��.o�n2@�U%e:B�7R���������,ty+��Z�����^r�.o�b2@�U%v@��r3 ���C���H����C��z�	������W��]���d���J
l�.o�b2@�U%v@��r3 ���2t?��IPk��T@�����V�%����"�:8���V�ji�H\$��^K�G$�#�	V��%�O���^���J- ;��Z�<#��q��yA��Se�S���N��_�Zw�"Wz�sf�W�V�������Y�U��(~j�y��Uj�9#�j�3�H�G�3��j���N+F+���0g�_-����fsF�U��(~�E�Z���sF���'��J6g�_�Z��JO6g�_�Z7��J/6g�_�Z��Jo6g�_�Z����� ���J-�j��9#~j!Z=��UkE\0G*=��UkE�u�B�Z�����9#�j�'�H�w�3��j-D�S/���_��3��=c�Tz�9#�����b�Tz�9#�����c�Tz�9#����xb�Tz�9#���B?�"X��Uj�9#�j�3�H�G�3��j��+�H�g�3��j�������_--��W/-~�E�Z���sF���'��J6g�_�Z��JO6g�_�Z7��J/6g�_�Z��Jo6g�_�Z���^���J-0g�_-z���`sF�U�q���dsF�U�q���bsF�U�Q����_--N��W/-~�E�Z���sF���g��J�6g�_�ZW��J�6g�_�Zw��J�6g�_�ZO��J���g���gW��LBu����j[����i�.��9��5���u���Q%2��3rhe])ni��9b�^���R�k��Y�-hU]K=�A�L��r�V���#?�;��-hU]K����������k�k�`b�����M��n%�T��b;��k)��6�����j)���&fh��C��pU�����Z.���'\U�$$�2C�||�������-f��W�+	���ra��qU����-v��W�+	���r��P���I�e�������[�G<_3�\�Uu-�����-6��W�+	���r��XpU�����.���Zr�JB�U�e��@�]q!���i�Y.l�iW\�$tZ�_�t��+	�V���'t�N��"x%��*��~���n��m��V����
�v���JB�U�e��C��p����i�Y.�8^��J���4�����[W���/��2`�e�JB(��\X!v\��$d��/��2`���JB(��\8!v\��$d��/��o�8pI������1'd����V���ej2���RZ[	���r��8p
������.���Zr�JB(��\� N\��$d��/��2��E�JB(��\8�����\I���p�������,-�G��p�+����\kv+��g:'��r����`���D��\���h���D��\�	����r|(h!�z�6��<#�,-�zA�l��||(hE�4NK��HWv���fi��87U�#]9qv:���Z�U��

Z���Q�#]�1g&��\k�e8��sf2K���s����p�+������3�p,W��P�B�5��2���93���Zc�(���,�3�YZ�5��2���93���Zc�(����3�YZ�5��2�>>�jmu�3||(hE�4�pe��
{�b�+�cW� ]9��/f!�����=X������Pk����+3���,�{�����=|5�����A��c_�Bp���+{�������x�{���>>�j�=\��te6��Y���Y����
Z*�9��A��c�(C�t5�t��as��-�Zc�({��L�3;>�h�e��sf�g��9��A��a������1g�=HW���<7Uk�e�+||(h!�sF��te��9�#T���Q� ]Y0g�U�1g�=HW6���>LUi�e�+||(h!�{�2����=��'��=\�yte�~��P����<��c?��(GcWK��{��F9{��O������a��W��������������?~���/[o]��|��?L��s�E��%��i����$�|��R�Ztq����K��ay�(]���v+[�o��;�]���t�A��(��T�z)���)������?�v��������������c���?}�|�b�^��q�uZn���o���o��������?������?��_����<������}��.��X6��7���.�-��Kn���������y��{����t#�eg�&������o����/o�������������{�c_,�g>���~���]�+1|�������Qu��~�q���{
������R9;
����Ga(�Q�
Ov���Q~���5?
�Pf�a�v�E�0^4^������S��@,����;/n#�e����?����~����6���>�tK��Q?���S�:5~&n#�@�u>n�T�6�����8��`��c��w�2�o���3��|�D���N���C���|��m>���/o���hZ^>����}x
�����1n/���/����8�����=o�ampo?��?���<n����2��|�K�"�q��|�#?����Z���p��5=�����n}����`Ev�_������v�n����?_����ov����$���O�����o>���f��F=�Qo����[������}M�=�z�4��_nz������%\��i��8X�|��G��������c��_n��	e)d>8Z�|��[O��K)���o?l}�����0�������A�Y+�;L�,3\���7X���T�Z�����?-��K�2z�2Z�\��[�}�,�R��~�r����pIk������~���O� ����%�A�4����}}��o�Y�=��E-1R�����/7t�����oF��5��i�|^�n?xNg�����
z&��X&�ZA�D��L$��DdYAOE^+��H.T=��It�\����,�7��?�2.p��?1}CJ(���	�~���K!>��cx�7���J��4���32.h��S����-12.p���d�y\�
����`�������O�]X�}��[,B3�[�n���Vw|��������M���������(H=k�����+k7�n����!3\�|����@���_b)���_�.�����L��9?���z�g�k7-��b���������[	��-�*�}��}D�B���s]�qZ�{H�"�|������������%������M�\���R�\pF�k��+�t��?�/�P	m��%4;�v����j��������{)��M��������?��q��]������`|����I����=�WcN����|L�Kb���;��+���K�w�o��v��HHAn�{!�r6�6�5kE`����=w�
\��	��F�(���>��������|PN����G�<�����@��;��_�}^��'��8������������1�-��[M��4m�}���
��1�n�)���6������q���4�����E�S?u�(jt��?�FT!�&3j`�C�����0��	���V6n�cxX�
��_�_�
~Z�����}P��
<bum����
�HZ�l����6rd�m)c�HZ�l�{�sZ����������C�o�o���h��o{��q�#�����"�����^�y\�j+�����|roV������5x��~�:�8��^�G�9mr��P�|n� R
����_(h��&G#������C�t09����x����������`���{1������Fg��a��fvh��U�����prx��^Hk%[�B�����
tYH��W���?����u*C�������?�� 3�����mf��1�/��k��6�|n�"S�k�3�\>�������M4M�m��6��6���PdE��~y<�����?=���t�i���������I��2�6:�����~��k�89�5�m���NzMD�\7r1�\���\���\�M�:�����\/���C�36e(����i��E�{h��F7�����OY�mqC�l2C �lZ�����|��9��1��Y��>���������B6�O"t>]�/����d"�|e"��d"��������,/HE^�s&�KU�y��n�O�� |��������K�rE�����%<�
�����[����h���
���A�e��gl>8e��4��^���+�e��4����z�w�w>�l-��_=�l>���Y���>8������������X�	Y��qPu��������i�8��"�E����[��v?��7�%�Y����`Zw�t����v���;��������\d�
��%�%���4K��T����o��2K��\�oSy.4g���~~.�x�H����(r,�
���Ac�2��L�Db9�)�H'2��@�<"���H,�1��4����Y�u>>]r��1e�y(,3�d�,�$�e�'���d���$�i�'[V~\n��O>��}�����r���i�����oc��>c�Bscr����\jt���f���7���~��e���Blbx��q����7r!����Mx��q�����crn8Z�<h�QL
r�$&��b4�&>�)1j$�� ~S#�3�� E��i�E,��E����_jp���������"�BS_�o7~��������O^j�7|�Rb�����c�A����.c��
��\n��K.7������V����e��qZO$R�J$^k����6�H�J�G,k���km<�����<u���FOY���C�\n�K.7���ANXr�	,%A���A��B�t��(����L8[)����>7��D�����m58z���8V��OUr��C�\f��J
p�H���
��j�@�F�����)`3���7|�R�>��MX�9v�����x������s�\r�1J
r�)J
r�!J.:��f0�G(5�=A�O��J.7������I
n���7^3��@o����t��>����p��&�����s�\t�qA�qj��84)52�l�o��$?1)5\�Q�����mb����q�����y�"���G%����=|R���������0�}�K�����~�2�O�������/ooI��4�Oo�m�h���|2m�}�l��'?D�����C.6�h����OFr�Qc��f������"%�@
��Ej� z&Rbd���j�D�l.�|nx�k/�y�������
l�0$��^�I�u������x�T��������c���KV"�K��L8��MX;�����\l�HMp���F������G
l��#?�����GMp��#>���&�W3�KND6��������d.9����������\l��G.8|��K��!�c s����Xb��&�@�nrh@�^k�X��X>��^@.b����lv6/��8jP�Gesi����z�\>�(���es�t�d.n�������T�|nF]K'���O���O���N&s����������F.6�H#��r��KN9�(=��0=|�Qz4�4��h����h�YF�����g`FyN'syN'sy�'SyN'sy�&?/����A��5~�Q<���F������E
lh�K����/��^������
��LXamp��ip�����y�:����S�\l�Q�D"�D��(~"�x��D*Sy��(~&����\�:�O��������\l�QE.8��"�qPQr(d����1E������P��r(����@�L�������4�'�L��2�'�L����"��v���D
l��D.9�d"��`]h��D.7�X�� �:�A%j!z&Qbx^�����\n��D��
�Pi�o�8�ss��GO#r��y�K�E�x�e^�H,���oM�*���,�x���5����q���u�	?�e���N�*���'#���>��>/L4V���T���0��0��0��0%Ka�#ra�@>M�����\/��'��y�(�,��l��e�e�(����l�,e�i�(�[�|p�����t�4��]J/@��Q���<y;���G>��c7M���h����]?���OR�?��������ZC�o�������@���_#��H���A��r:Z*g>�+g7��mq���>J��������;zZ�����nS�
��S�
W���N���V��H-����e4����������u��Q�6g?g�����S����������7[�2�z�������S!����4��TS�����p,�nU���I����6�T=8�-��#�|jh]:�� I� Y��3�\��\���t&RR���<����&=\�����~X���O.6�{&�f��k2��+1jE:�f��l2�s-12Yp�9��|4O^���A?�umr��>��.�K|��r?��.6���]�ZA����?R��+ZnO���U�<hREKGKE�GsE�F���-������TE���������H�^V�c�-3����+Z^tE���h�A���U�Z7�P���R���\���Ow<���o����-���i�������Cu��F���8���.myq�==����\�M��n�@$T�t�T�|4W�n���q������y��9���m*r���Zf���.iy!����(�=5R��b����)�,-�,����|�6�����i>��n��m��%�*yn�HU�R�=���[%�-),��sK�W)3�[����w�������kZ�����\�T��d�'"��!�$"�������`�#�Vw�\���{�{�A?���RFy��������A�4������0���fo.�&�A&�d��,4����d!��e���~Ge��]�SQ|n��~ue���5�+���W�V���g��+[^tek����*U�r� �-/�b���Hr�k~g,���u����v��Q�������U�5�����\�<loY_]�r��U-/
|:JvtQkE!�W21j���!e�����������~^�����=l��~���?D_���S���_G�w��O�<
�>q�g�}Y�����S���CG���y�R���Q������|4���������%��w����d���]�k�{_����.��������J���g��j��G�Z���7�Wo�r�_��)Z�����uv<��R�h��-����B�����[��/���,e:M�B��o	o����7=f���<�D$-d"�G��$�4Ik�H�H3��#MD�2@�it�o����s3N\��G����3W�����������
�+$GC-�%GC9��P��4�IG�l4-��|4s��2�F?����nD������~���.�k����<.6��&�A��Fz\i�U��a�B"9<J���c�S���\4_�����~Ge��W��������w�����kZ���<.����~{.ey�J��L�lyO��R���U��VSjA7���*��x�X��0�J�=X������\Z�TZ���D����������,KD�H)(�w$��mCO?<j�	+�dY J&�YH.Y�Br�j�(��&#�d=�%�Q:�'$8iB����|��u��<�IG�Q?���p6��OXq�9'�4�?��G�<8����@�.�f ����(/�=�'���C ��{���"���M*��h���h������m����V\j�	+
n��~�Kj�RQj��%�@��%5j�(5\�c ������|4��������Q�&g�j���~e�K�?a�kY����T"RM���?������L�L����T���U���&��n�;���m���[�NDr!HDR!�C��t"�+@�+@"��O'"%�;��H���>a������L�|����1�7�_s��]yaP+��7�fs��kyqP��4YHG�d!���n�;���l*���RNXq��V\l|�#3����kZ^��GftE�
�T��(���A�*Z:Z*Z>�+Z7�yE�'u����RNXq��H��`�h.6\�R��*Zr��%AU�� ����h�Q�����xE�GsE{�*Z?���m��DF��7���?a��F���8����C����A���8�������BU�V �[:Z�[>��[7��	+����<|���M8a�q�1�����k�!P�{j����@��2C����H5����f���T�t�T�|4W�n�;�j��3����
?n���`-��U�����e~{.cy�J�[&RU��o/,�^U��o5�t��QNw�b������\�T��d�'"��!�$"�������`>O�}t���OXq�����M���G��=���
���aPk�0�N�M�2��a��������,4�-c��D?}Q���B�]��R�x���~����k���/|����?QC�5�?_��������i����kya�u������� u-/r���8_�����3M��g�����;�����5z�J�p��K�?Q��������]�ZQ����9]%9
���FAJZ^�P�C�P--C�|4��n�;��������rsNWq�9���������y>= r��O�Y@��O�<�u�J3"���4r1�����u%�h���h���h~]I?�y��W�>]������D'����3NWI��T�&�w��z����8]%=R��?���jD#^�Y�Q���|�J#��U����t>�OWq�)����g6|�������HZ �D4I�h"�����f"�?���e��;�hq�p>]�������SNWq�)��$GC-�5�>]%9�UH��Z�K��rrc�98i�����h>Z�9�h���y6��~G���M���U\n�t����������y��a�Forte�������]�s��Kr�7��&�A�c�T���R3��\3[������=�����w�����*�����<����������*�����=��K��WUf�[M���F-8&
�98{�v�no4*.���N�q�O~(��z�����D��rV"��`&���D$-g%"�&"�f"�rV"��_bj�$����Q���^}����������}~,X�3z�����;���
q������s3��H�Xw�������u�H����H����u�����j�����W�\p�f\�=��F0��F^���;���� �RY��cM������)d��cm�?�J�/�/Pe�������D���R�=���{BQe�u���We�G�R�������	�2������i�-�����\��C!E:��+\�����9��;�U%��~����R�}��{�����X<�t�gQ
�����Z���>�k�8�}����*N?��o�/>P�����d��p�����W���������K����;������f�������w������������c��;U�=��Y�������/w��E���s��mk�e�G$Z��P��V����o��|�G��u���*��x��W�=���}��=��8\�}��a�������*����������EHy���g������'�
endstream
endobj
8
0
obj
39742
endobj
9
0
obj
[
]
endobj
13
0
obj
<<
/Type
/Page
/Parent
1
0
R
/MediaBox
[
0
0
612
792
]
/Contents
14
0
R
/Resources
15
0
R
/Annots
17
0
R
/Group
<<
/S
/Transparency
/CS
/DeviceRGB
>>
>>
endobj
14
0
obj
<<
/Filter
/FlateDecode
/Length
16
0
R
>>
stream
x������J�������&����O���+B�`��{��aU�T�T��Uyv������0��p�#��?��~��<�X���������?������������1�c��_*����������3�q��q��]�w�k���3���5�O�~���a?u��_���1Z�c��������y���������>�������q��r���?��1���1�����!T��Q�h��XB�z��}$��<����?���)S��1�����(�v�Y<�g�SJ��c�g�)�Qs
0$1���)�Em��)�)��������u]�D=�O�5��y���'�S�����O����"N�)���d����yq��b�>�i�)��E����M�(�k�p����;c�����������_M�Gu_r�Y����v��/��S�Qc�`J�x�����k]�L�pN��)/��c�O���D��O�}�����(N�G����d�w�j�8����'����'��1��'�S������'*�;������c
<jL�$�;1�<��<���H_�����S�u��?�/�b
yT?>y��s��v����b�_�|�S2�;5E4�S����~H�����������a>Q�q����9��'�S�Qc��S����xT}��.g1���s���X>�.��s
0$1���)�(��gq��������:�{�Y�b�����_M��-r�s�v���c�OT��<Q��^��|���<N�S�Q_]�������~|����6�����S�u��&}�S��[S��D�^�r;�?�I���Z������.y�~���7��"5�_����/��G���G<}����>6��G<}��_L!���'�:�>�Q;����G���GI�w�b�h����'���I�:c�OTo��98����c����>6�x������1��_L!���'�:x�;����i9��QO}g}�S����a}8�b4�C����X9E<jNn����_L�G��
��rK�0^����)�����P����"�3����s��7�?��s���|��)_�^��_<Q��z�_=J�x����G���G|i����c�r�����I_\�8�3�:�}����6�q���8��k������3>�����M�B+�;��N��~��~��~�s�_�f~�z^����S��r�w�Y?����z|<���P>���q��������1?e=�q��x���3�4��S<���s�3��2�5���9����L�~����8[>�k�s��1��0��b��������)�������~�����}�}<t��u^_����z���~~��5��g���J_x�A��'�L0���2��f=m������K�Y�N}Iq��)���=K�:x���'_R��e�����~�j�����c�S��k�������t�-i�K���'��x��������9N���2);��[�/��z$�ls�.o
C�������;����j��O-�#L�����0�Y���w���L��_
����K4y�dL���:����kCC8	�;�|�z��~M��Z�Y:�����Ut����W����cVNu?_��'�W����%_�������m�_����y����0)_E]��I���6)�|�z����z�z�����z�v�41��N����<}H����i�f������>�d-:������\'}��2���3�D��y������c�j~�yMD<c�x���/�%�9�����z^��m�X�t~��~����&g�r^!�:��c�s;����0�A��l�Y��i���	�xg����b�:�}}���'��kR���b���%������:k|��9k�7��hP<�Y����c�w�Y�t�h{�z:������6����/f��t�99+���xg�u}|f����g|*o������#+j9�;�laA�������0�Yfy��x�j�����3���=�<[�����:5}]1+�`��������|����E�{0	�;��T��N��H�p�I���;�I�����*N���u]-p>�U?���}�������u��GY������=��~��	����i?>B�/���/��w��y0�Y���_���+g������xg��koF�s�p�O�>�b{���~6g49�SpN����/�g�oW,r��u^��.�;��{�G���Y����&g7]�~q��/>a���S�*���ujx�o�.�yq}�y�;ujx�o���Sl����zj���0/~���f��_E�w^����G�h;��\�5��K=�;�Eo`^�����e�?�-�������W~�xg�����1�m�Y<(��I�����9I�w��So�=�y��R�w����+9��:�_�������}��D��~-������e�x	�;���s1��L!�;>���Wy���VS/��^��y������(�����
N��x�0+�`�����*�f������xg=1.���������5mN�-�d���x��=�|�*Y]���1'o�$��/�Nxd����b�y`���?�>��p	-�_�?�#������8!�P�����hE�>�g�w}�:��5��%AEmV�$��K�{F{�3��������7�-XO-�;���_8�R������X����F6�~���R������u�4��W�u,��qu6��o`#z�x�ru��Y�I�fZ6#�nA�������m��N6C��rIM���s�xg�4�4D=�^����cN�n�����;���@_���{W{4y���0��l���W�/��m���w���g�9>e�wr��.~<=V�����������D�����3��������=��5�P�OY&���r���io=7�w�s���8-�V/k�ib��N����c���V��z]gg��>e}�?���E�'?�=�Y��Y}kW�w�s���s:vg�k�a$+�\vg���9�mk�c8�;��������n������d���1-�v�a��0�Y����s8)��6q�w�������m����'7��xg=�?^����qu�����1���[���O���/!W
/�%�;���t�����r�c�����W�lH���q��w]zzq�����8'��g{�F�u���}uNN{:������:��^��i�7']�g�������9��c{�F�u���wu���m�v����N/�'�/��]������Xa�s��F{uN�z:���`����wW�pR���D���xg��o���i��i�N���e���/l)�����F&���eK���q�$:E�D�$���>�)S���>SI�w�����s8)�B�N���_n:Z@�����������q������q�}�w���]��Y����b�I�������9���y��n�����.����9����.�q�w�Y������>t�[�
��m���:���9+���xg��������]�<����
���{��[���e���:g6]{���H�1������]��Y�:��g�xg�������|�^ y�C��w�_]������n=3�w.���/��m��9�B���g�&���s�q���:x��H1���d���������n��y0�Y����sr���Yo=;�w�Y�������`�������s8'���w#�90�����]��y��y�.;1�Y����s���&�nZ�]�����<^zNq:����\����9�/��Sf��x�/N
�:��[��*���rj{u�Z�U�����
��W�h��W�p^��~Y_�����W���������b����wW�pR��,�V���x��>�x�X�����m�8��['����J9��+�;�I�7zm�Z��~-���b����-x	�;��;x�x�:�w�Ao�&N�w��{lq����m�:�}�~���|�����K?1r^���;y���S��S���b��S�	����[����0���;���5����<��f1�y�����=�	'��o�	'�w�����w��7mN�g��������S.S|��0o��r^0#�������%������"����=5>E<��?��.�;c���R����4.�H��w����:��(�Oo��wg�x�/�4u���)�s1��L������)��4�yb���wX:��:���.�;1��C�F6n�p������5�e#������,�����S���x��\�c���O-�8/A�|�<�x�����F�./%G0����-�
r�nN�wB�x���������9/�w`���_�_��6��;0b��S�	������8b�s��g9�
�15���Sb��W���.s
�?%�;��=1��Z�O������Sz/}
���?����k�r
�o&�;O���k
wf|�����|/}��83�,��MEe�4%1�y�/}t��#]��E�m@c�3������5�����x�:����n�����MT���_�_���S�MTc���Z��b���N�r�/�\�w�M�_����'H�2��9�c�w�y����|,,����{��a��_��� �������<1���[��#��@N�$�;�����e��u������2��]&�SD�����X��x������3�����!m�$��i��������)S��Nq��`����_^M���[�&��&�;qf|����9�VI�wn����A�Z����������:���_���+o�}��j����)�)��S���o#���y0���S��o�������S���������v�3�����y���W���-X��x�����|���dN���w}�e�o=��O��[�.�1�����S�N�
���/:�;��zZ�����������d�yq���O�����8�x�s��u
��|�w����U������<�;_�oO-�S������x�2��^5�o���@!�;qj���)�r�UJb��S�	����� ������C����gw��WIZf����������T9��E�w����U����m�����/�6O!�6����x������L��p��p� �3��N����w�n��<1���S�;��H���������_>{+$���e���b���_�H���~��.�J��0�x����9/{M��p�w���N�gC�\\v������S�S2�����)�O6`����'N�gS�!�����S��`^�����1���S�1^%L������_LOp?Ny��g������1����r���y��6b�s��7=�)���~8����{j�f������|�x��8��z.���C�����'�������[�~}q�_���y�[�<�����?.r
>��q�O�x�����"��p�Cv,Cc����g�x��|���b������7?S����>t�[O���85���1�S�)���{j|������D���85��rTp6$�&�;c����8/NE���M�g�������O�g���v0����gHN��)�\����e��t��K���9�;qj|64Fz8�F�?����b$��WIc��������?��������M��R�w����U�o��~&����85x�S���4��_n���_={��]v|�go�'��o~�����88����;������S�p<\|������pp
��d�-��x�2�w��������z*���r�Pyn�w��Y�/�]�g���_<�x�1Zc���D0���O(��y��y��
�;�������������1�����rZ6�^�ib�s��w����>t�[�$�;���mn�i�-���4�x����/��69�V���g����yJ�<\zK��|���$_g���)��9\�r��;��y��������S����l����;����o~I��#���P���\Jo�r��7
�S���D*�1�;����,�&�py��/�;�?m�)S��N�?���w�����������<1���S����KcN�??�����O�l�	���
��1�����O�~/�_�|�;5U|_�����}��`����7��������_AW��\�����)�M�S�/������Z��p^�J���`�s�������������9��������������h
�������2��_O����8�<����M=�\�����A��>�q?�(P�V)7��7����(Pw)_��k�<��Jkg���f?���g���)0:�R�N��ob��:Q���R���W����(P��������>G�DK(�������K2?����:O�������>��v��Q��5v������>3j�������|�g�3�^���I������}f4K���]������
�0����i������}ft��{~S�G��)�����u�[����S�j���u?��X��u�uu(��u����??�q���@	���d����G��5����u�b������9j�]���`����G�c3
�F����u}��p3�(PZ���f��6�G���)P������u����7��r�V�ntvn&x����@����]�{�y��(P�����������O�S�4��u�m�~�?��V���(]����G��)����Z����u�+����gF�t����9�u��}f4K���]7ovn��(��t�:�u������gFG��u��p��#oJW��5�uk��p��(P���b���%G� !
�V����u�U���~����.]w��:������Q�t���/v�}����>
���u�
~�?��U��h��������9J�t���u�m~���]�L����������#wG��Y�nwv����=k���u{���	��?��gF�t���u�F�qK�G�������f�_�lW-W:�R�:�[gy��@��R���`��@��R>'��d�=k�����^,��u�*�(a����:n;rE�+w)o��m���Z��v)����u��q��u�|�,�g-PZg)����]�����s��k'����:\��Jg��v��p!����F��U��
v����U��h��k�]�KS���1
�V����um��p��S��v����������}ft���.v.�<�*�(P�J�]�]�����}f4K�=W���~��j��@	���d���]�+��\�(]�\��#�����gF�t]��:\lu�eQ��K������EG^JJ�t]��u�������~���Ws.G��Q�n4v�\82���V�ntv��#��(P����b�!C>2r��U��	�~�??�/<J��g ������Q�t����	����gFW��9�uX�?r�8
�F����us���@J�t���:,U�R�(�]�n��:����U����p��\p�7�K��J�aYz��Y�����U�!"9|���D��t|��2�X�L�uK�!)9|�X�B�;0�g�%��eKA���M��3�&��"+�@�
��r9�����&b�=��3��&�_"��0�}����K��E�?N���w/?>�\�����l�8��0`)T0����c3`aT0����c7`�T0�niy�i��� ��C�O�a.�b�)��Z~���w<�XPL�!-���U`�����9���)����9K�����=��������I�/iy���eY�����9���)����9K��)����9���O�<������M�?��`��wl�,�
�ocw|�\��pq@�������4`�W�b���Sm�K�.���x�I�;�,
�X��wl�,����w��,��--�8�X>����T�B2����6iy����e�����9��)�[c�!��9\��pK�;r��/��������9�x)��������������������Sl�y/��Sl�y�y	���)��<����EqA�a9<�
sy���KyG�/1��i^���`�M1�c7/a�\0�����������}W/�}
?�
��:�W��>�.3<.k]�Z�����u�C���9����S���{J���&v/����WO�a.�sxu@����1��6��E�m�[�
�6�q��8DlRm8���+�w?�!�9�����(O�a.�s�8�����#�Os�SlH�;^�D�)����9��`�miy�e@< H1���6������'C�%-�x�����w��S���w���
������0�C�m�!fx�9�X��wl��)6���9q�`�������pB�b��TfL�������`��w<�.S�K�;6sb�������PC0�����2 ��w^<��vp�8 �'=xI�;����)6��/s��[�����hD0�����2 (�/�~�
32�pq@^���%-���}�1@�7��I{>�*���gc&���2@��%bq"�����Z&�`��[�
�p���D�{K�!���0?)��$����"@L�~J�!o"�p6�6d�����E&�/�6d���]�"	Db���pxu3`8�(�)����9�`�Miy�n@�"�b����4 @���6��Oq2�%-���Yh�b��Rv���rN,^B��Z�Q���0p�y_�l�������q���R�KTf����~����@��4/!�L�!�q��K�bSl�y�y	Q�`�m1��2/!�����6�(��xi4���6�k�(F0����c7 �L�[Z��Q���-@����p�8`�d/3<��bS�K�;6s��������(F0�niy�i@#H1d/Qm�Q���!@�1���(F0�����e@#�bKZ�q������wd������HL��0��8�	�Os�����9�`�-iy�a@�"�b[Z�q��R�ITf�����}���@��4 @L�)-���PS���w��(�cb�T�������%b��eb��X�"��T�nb}�X"6�T�abs���"vO�6����%bwu��6�������-b��mbm�X�"��T�mb��q��:��p&�N[���S�
�ib����?�ITf�����v���@��4 @L�!-���n��a�%0�����X���x�����(F0��������x�m����6�(���K�)^B�1����(F0�����2/!�L�[��8�K�b)��%�
3��p��`�8�%D1�)��<����(F0������_D1�)vK�;Ns�A�!{�j��b8\0q���D1�)����9Q�`�Miy�n@#�b[Z�q��R�KTf������^����b8\0/q�T^�D1�)����9Q�`�miy�e@#H�uJ�f����u��%-��(���4�(F������z>;+�(��gkHW��1CF1�D,N8�T^��"�L�uK�!��"�
po�6d��R�KT;2���S�"]�S�
�px�HW��I�!��D,��}I�!��"�J ���6�(������8��0 �L�.-����bSlJ�;vs������9Q� ���D�aF1~���/iy���E0�����e@�"�bKZ�q������wd����d"�HL��0�S��b��9�`�
iy���E0�����0 @L�--���P)��$�
3@��&M��b���E0����c7 @L�[Z�q��R�ITf�����y���@��0 @L�.-���PSlJ�;vs�����9� ���D�a(.X�8�	�s�����9�`�-iy�a@�"�b[Z�q��R�ITf������ 1���i@�"�bCZ��2 @L�%-�8�PSlK�;.sA�!1�j�P8\�Oq���(�)6���9�`�������EbLL�j��pxu3`�D���L�uk]�z�j��M��C���j�>LlN�S��)��s���D��`b���pxu3`�E���M�mk[���j��Ml"6[�T����)b��}J��:Ml7��$&Qm�
��������p�8�]�$&�6���=SlI�;sb��������C�b�9��0c\�89�Osb�����9��`��������C�b�9��0c �6iy����C0����#c�.@��j�n@�!�b����4 ��r��6������ ����a@�!�b]Z���{����w����)����9�� ��sD�a�.��89�Osb�����=8\0�89�
�9��`�miy�e@�!H�uJ�f�����u��%-����#�?w,�U����!����0vT��D~������x��Y�,u\U�Z��f�����J�:�c��R�:����#8��*e�"b�gl���XV)K����#8V�*e�c�J�7�c��R�:^U��Fp��U�R�z��6nG�q� Q����%�n�������xF���8��:��<#x�G��R���f����|����x&����=��#J����%W�qd(u�=�F��������33s��3P���vf��v&��(}a��L�����&P�X;3���Q;�Jkgf��jg"�@�c���w�L$(u���	������8"J��^2^��3�E���vff����"�@�c}��`!��=�����3Ux�	~�����3Rn���?����x�	���8��:��0!x�G��R����$���;����x����82�(}a����[�q(5�_��!x�G��R��������P�X{�o��v�q�(u�=��A�����\ J_X;3���G�L�(u�������v&�:�����];qJkg��������, J_X;3������Pj(�@����v&R�:���U���v&"�:���%���B(}c��������.)u�U�uU��*�.���*��J�^U�e�:���s?��U��R����-=K�?�X����e?��U��R���R���wU�[J
�������uT�uH��Y��n����O)ulUik����8V������s�>��=��}�:��w�����R����V�kgb-���3sm��������(}a��\X�jgb���3sU=x���<J
�o��%���v&��Q�X;3��~������3s1=�����;Jkg�Jzp���ew�:���e��Y;k�(u���k��w}���;J��o.�?����b�k���yp�=��v�:�����^{��(u�=������8�Qj(��������8V�Q�X{<W��omWb�k��ry�U{k�(u�=�k����8�Q�X{<��W�q������x�����XRG�c��\"��[��/�=n���|�C(Kk��������
���
��g|�������q	�E�3�z;�/�ib�<�
�������B:K�Qm8n�
�)����6��{�A�q���v��;��"+�@��G�!��9��X,��l��\���K�b]����6�"<�w���@.�?������\]6q��a��������
�)�U��6��<��������9)�����0��1�`�xI�;��l0L�!-�x���@0�����0`��`�miy�e�A�q���6��~�S�u�3<��` �bCZ��2`��`�a�>�
��������\��j�e�A�q���6��o����\�1�f��������
�)vK�;Ns6R���O�aF.��8�K�!fx���@0����c3`��`�Miy�n���Z~T2�pq���\��
3�pq�:�\�1���
�)6��/s6����w�l0L�--����` H1�(�j�
8\p��F!fx���@0�����e��[�����
�)�� �
�9)����6�����091���
�)6���9S���w��l0�XF��%����f�v���q�X�"����.����X"����!��}���"6���S�
�4�{���?���Sm���W0r�[�������E�o�6l���!b��uH��8Ll�"�N��T���v�-`��TfV����v��1���i������9SlI�;s6����w\�l0�����0C\�8��E���l0L�!-�x���@0�niy�i�A�1�x�
3�pq@?�2B��0`��`�uiy�f��������
�)vK�;Ns6R���Sm������D�b��H8\0�8��FT6s6����w��l0L�--����` H1fO�a�%.��8�1G���l0L�!-�x���@0�����0`��`�miy�e�A�1�x�
3=�pq�������g�e||P`�!�o s��=
K�����1����{p��8`,j���Z&��`����
{p��H2����jC�1�� ��sD�#c?E,�`,j�����7�$�Z�6d�����E��E-T2���.b�d"��j��=8�:�I��sb��������C0����c7 �L�[Z�q�{R9GTf����8I���wd����$�!-�x�{����w���)����9�� ��sD�a�.@����Osb�����9��`�-iy�a@�!�b[Z�q�{R9GTf���M�$���{p�8IpJ�;vsb�����9�� ��sD�a�.@�l����9��`�uiy�f@�!�bSZ���{��--�8��=)��#�
3��pq�`��w<��=SlH�;2��pq����w���)����9�� ��sD�a�.@����Osb�����9��`�-iy�a@�!�b[Z�q�{R9GTf����$�KZ��4 �L�)-������H2�����4 ��s�����^�$�.3<.k]�Z�����u�C���9����S���{J���&v/���sD�a�^�$�n3<nk[�������m���]>pRm8[���S��)���4��Dl��sD�a�.@����Osb�����9��`�-iy�a@�!�b[Z�q�{R9GTf����$�KZ��4 �L�!-�x�{��--������H2�sD�a�.@�l����9��`�uiy�f@�!�bSZ���{��--�8��=)��#�
3��pq�`��w<��=S�K�;6s�s��b�Miy�n@�!�b[Z������d"��j��=8\�$xI�;����)6��/sb�[������C0�����2 ��r��6�����H2����#c�����7V�������Fln����xo%AT�q���#����Y��D�Z&���+f��-b��#����Y��E,w{�\6\1���\v{�\6\1�O��q�
W�j�&b��#����Y�x�X����l�bV;v����vI��/���=�@ %A��8����]��������#LqRT;��n�8p�����QPv{<p&AV;�����#\��$�v��q`I�"�@��4m���4-�
T;J��
���I��/��-4��%M�`����e�F��6P�(M[6h��%M�`����e�F���6P�(M[6h<p�&AV;6i��A#tiZ�v��-4����m�v����A#���v�
���I��/�
qW�V�jGq@���8���e�F���v�
q� �@��8�l�x��M��~�8�l��M�`���8�l��C�`�����A#,q�
T;���8��6P�((4���� �_(M[�T��!M�,����eOEX���"P�(M[�T��-M�,����eO�s�ud��i���8pI�"�@�a��-{*����E�jGi���"����"P�(M[�T|�-�Q����uOEh���K������8��s+f���>Ll�C����ib��{J���[�;>	����uOEh���[����mb}�X�Rm8���C��!��������)b��j�&b��@	�������"\�d�v�=q�V�jGq@�S�4-�T;J��m��J��/��-� ��%M������eD���Pm8�i�6�8pK�">@��4m�����Y�Bi��
"4iZ��v��-� �@��E|�jGi��
"LiZ��v��-� ��-o��P�(o�e�s��d��eDh���v�mq����Q�yLq�T.q@��8���e�s��d��eD���P�((� �� >@��8�l��K�������
"lq�T;��6���"%���<�����Q���z#�T��y��a�.p�,b���B����/�y"�jC�~�X�@�Qm8n�=�)�< �
.�p�����������"yy@T2\��&b��D�!��D,� ���6d���]�"/d�Tf������M�x���B0��D�!� /Niy�n��
�����9{*)�<��6�p�Oq��%-�x���B0�����e��
�[�����=�)����9{*)�p��6������0\1���=�)�p!�
�Tp�8�q���6���L�--����S!H1�O�a&����.��a3`O�`�Miy�n��
�����9{*)�p��6������0\1���=�)����9{*S�BT2��pq����Qm8��S!H1�O�a&.X�8��B����L�!-�x���B0�����0`O�`�miy�e��
A�aETfR�����0�1���=�)6��/s�T����6d�����{��TD��2`O� ��T<��{p�8`��&!fx���B0����c7`O�`�������=��������^�$�.3<.k]�Z�����u�C���9����S��.� ���6d���K����L*�j��=8�:�I��"fx�&����-b}K�a�&6���C�
�ab��u��>��p�&���mq����0c�S��"�Os�T����w���S!�bKZ�q���B0����#�Tp�8�mqc���03\�8��G�����L�!-�x���B0�niy�i��
A�1�x�
3C�pq@?��=B��0`O�`�uiy�f��
�������=�)vK�;Ns�TR���Sm�
���!`�b��9{*S�K�;6s�T����w����L�--����S!H1�O�af(.��8��G�����L�!-�x���B0�����0`O�`�miy�ex���"�{<����.X�8��G�2CY
��C��@��c����>���5E����/��%���2��=Sl�Rm������E��[�
{�p��C����=8��H2���jC��D,��lRm����/�$x_Rm������E���#�
3����&�&p<��=S�K�;6sb��������C0�niy�i@�!H1�Qm�����$�KZ��4 �L�!-�x�{����w���)����9�� ��sD�a�.��89�Osb�����9��`�-iy�a@�!�b[Z�q�{R9GTf���M0�89��9��`�Miy�n@�!�b����4 ��r��6������ ����a@�!�b]Z���{����w����)vK�;NsbA�!��j��=8\��.�`��wd����8`]���6���=SlI�;sb��������C�b�9��0c��89�Osb�����9��`�-iy�a@�!�b[Z�q�{R9GTf������w�/iyG������s����{��--�8��=!����6,��W0��K����Z��E�w�6l����>Dl�6�����9E��Rm8���K����Qm���W0��[�����1��?��T2���q��8DlRm8[���S��)���4��Dl��sD�a�.h�89�Osb�����9��`�-iy�a@�!�b[Z�q�{R9GTf������ ����i@�!�bCZ����Oq�5��9Pm8��=)��#�
3��pq@?��9 fx�{�X��wl���)6���9��`��������C�b�9��0c�C��b��9��`�uiy�f@�!�bSZ����/q�X���6\���C���{p�8`���3<��=SlH�;^���)����9��`�miy�e@�!H�uJ�f�����u��%-���c~��VR���s��1�"�[I	�������q�
W�j�%b�21\6\1�o��q�
W�j�-b��#����Y��� ����#����Y�x�X����l�bV;6��q�
W�j�K�r�G�e�����X��x�VR��~au@���8)	��e�G���$�v��q`�����QPv{��[���������9�[I	�����q� %A��8����K����i��8��il��Q��l�x�VR��~�4m��.iZ�v��-4����E��jGi��A#,iZ�v��-4����E��jGi��A�9�[I	���I��
q�K�"�@��4m��p�o��v����A#���v�
��JJ��/�
q��l��QP6h��.@��jGq@���8���e�F��6P�((4���� �_((4�@ �@��%(4�� �@��8�l��K�`�����A#lq�
T;����n%%��J��=q`H�"�@��4m�S�4-�T;J��=q`K�"�@��4m�S����Y�Bi���"\���"Pm��i���80�i�E��Q������4-�T;J��=x+)AT��6m�S�%b��j�.b��X�"��T;�����9��q���&vO��T;.����� �_XP�T��v�X���q�X�&����-����q��:DlR�x��:Ml�"�O�vl"����� �_(({*�.���������"q�T;����8��i��Q��l�x�VR��~�4m�.iZ��v��-� ����E|�j�)M[�A��[����i�6��n%%��J��mq�I�">@��4m��4-�T;J��mq`J�">@��4m�ny�F|�jGy�.� ���� �_((� �@ >@��8�l��]�������
"Lq�T.q@��8���e�s��d��eD���P�((� �� >@��8�l��K�������
"lq�T;��6��n%%���<�����Q����KL�S!�o ����ZeX��E�y��"f�p�����y��jC�~�X��uK���M,�T���Rm�p!�cO� ��D�#�?E,�`?�����7��8�T2\��K�"/��T2\��.b�"�j�8�:�y��.p�8y�K�;6s�T����w����L�[Z�q���B�b���0�?�������i��
�����9{*SlI�;s�T����w\�������6���������#�
�K�p���9{*SlI�;s�T����w\�������6����8`4q��6s�T����w����L�[Z�q���B�b��0�
�C�pb��9{*S�K�;2��pq�����6����L�[Z�q���B�b��0�
�C�pb��9{*SlH�;^���L�%-�8��S!�b[Z�q���B�b�D���Tp�8�$�����i��
�����=8\pq�
Ts�T����w\�������6������ ����i��
�������=�)vK�;Ns�TB�I�SmXb�`�l�����.b��X�Rm����!b|�p�6d���S���{J���&v/���TD�a�^�$�n3<nk[�������m���C��!���0�u��:El�Rm�N�M��8IETf�����v��T@��4`O�`�
iy���=�)���{p�8�-qbT.s�TR�GTf������ ����i��
�����9{*S���w�����b��6����� ����a��
�������=�)6���9{*S���wd�����~�{D�af(.�8��s�T�X��wl���L�)-����S!�b[Z�q���B�b�=��03�S��b��9{*SlH�;^���L�%-�8��S!�b��g����s��)-o�
���)X���#3���56c���f�������>�X��X1��=0|�N$+`Qmx-C�Q1����{`�1��X�j��=���=*R�9�S�����"��#+`Qm���7C����6���/C����6�����!�x�9�SmXb���#��sc��)����9��G���������b�a,�
3��pqc��s<��%����8��G�%-�x�{TL�!-�x�{TL�%-�8��=*����w\��)����6,���{^����9��G�����9��G�[�������b�a,�
3��pqc��s<��%���M��#�K�;6sc��)6���9��G�����9��GE�1�x�
K������MZ��00���b]Z���{TL�)-����=*��--�8��=*R�9�SmXb0�l����9��G�����9��G�[�������b�miy�e`�Q�b�9�j�{`�8��G�%-�x�{TL�!-�x�{TL�%-�8��=*����w\��)����6,���{^����9��G���������b���������"�2���6���Wd��.3<.k]�Z�����u�C���9����S���{J���&v/��2�x�
K����{b,�3���-bm�X�Rm����C��!b��j�q��:El�"�O�6\���&b[����6,���{^����9��G�����9��G�[�������b�miy�e`�Q�b�9�j�{`�8��G r�3�����=����e`�Q1�niy�i`�Q�b�9�j�{`�8��G`��w<��=*�X��wl��SlJ�;vsc��)vK�;Nsc��c��T�����=����a`�Q1��sD�a�.`�8���9��G���������"��s<��%��pqc��KZ��40���bCZ��20���bKZ�q�{TL�--����=*R�9�SmXb0����{�������������'�������Y�	�"W��54�R��%r�2�XE,�/�E���:�`)�9 8+i����G�b-M����9� 8�i����M���@��	��_"� bEM�������8��K	f���+J�M\����/Wd�]\����/Wd �S\����/Wd$���	
�_,��P$�FS�Y��S\��\�
�(,��"�X���>X�bi�L3p`K#�`����3�����`��Y�8
����������i���6F��Kg��K�!�_,m��lic� ,��q&q����7i��6p�K#
a����3���)o�CX�bys�|nq���X\�	G�-�������8p��+������L9p��+�������9p`�+������L:p�W a����u���J0��,��������/qE�80��GX�bqE&8��HX�bqEf8���HX�bqE�q����7KgT�C���_,m�a,icd,��q�8����n�����X���J0��,m��\��H8X����3���)m����/�6��nic�,��q���J���pm�]�@�D�]Z��.r��\�"����x�\&7�����/�"7���S����/^"w�+x�*�,suE	1p��"�n-���M�o��[���!r�0�u��:���������)r���7�c�p�*�,��"���H?X�bqE80��?X�bqEF8���Z������C���J0��,m�I\��.X�bi��"p`H#�`���6�4nic�,��q�q����7Kg"�M��_,m��ticD,��q�80��b������K��-o��1X�bys�d"��V�Y�fqEf8���2X�bqE�8��3X�bqE�80��3X����P��W �`����Q���J0��,���.qB
��X\�9qb
��X\�I,q�
��X\�Ylq�
��X\�iE�������
�+x�+�R�bqE���<?�Fr�����L,��W��$ Y1�"C
�5U��!HV,U�I��Rg� Yq�bD�u����]1��(��U��i*�D9`����d����q���
��D���bo�8�*8����=D�(��*83����v�f��s�#Y!�a�An����Q.�]=��������8�#/���Q.��z���=��#���l#�3+a���A�A��#/>�3�F�\�z���{{I���R��x�g��D�(n����{�J�S��I(8g���~�g�@��t�`�rQ��_�l2Q.�K=�����N����#/^�l7QNE)����+�zf4��(:7�6�(��yqw�`��rQ��#/��lAQNE�*���!MV�gf�y�uUtfN�����3�V����3���\�z���=�m)�E�V��x�g�7E9����sf6Y��Y�z�1��v�(��y����V����#/��WQ.�[=�������T�>(8g����;��<|uUt����P���z�9�/���(��y�p�`#�rQ���/�v�(�"�Pp��(+�3�T�0����������8�#/��lnQ.��z���=�.�T�p�Qp.aQV�g������������b�y�wUpfd�C�P�9T��W�S�T�{�����x/U��3����sfGY!�aDn�*:�+���m�b�����+�C����P�q��:Uq���OUp^�+���[=��'�3J�
�L;�3L��|�g�;F�(��
��{[d���R��x�g�OF�(n����{�e�S��R(8gP����T�0[���������8�#/��3�;�\o����{h�S�!S(8gh���~�g�3A��p�`+�rQ���7���(��U���UV�g�T�0p���t�`g�r*2s
����B=3�c'(:���Q.�]=�����F��8�#/����Q.�[=�������Td
�ie�zf���PPt>�3�|�\�z���{;p�������l++�3s�g�V��x�g�G9��q��++�3�T��K=�b�\g��O�R���3�j�v��+fU�T
|MUt�T�KqV��T��Z��TJ�(�[�3�b�VE�R��U�9S��`*%��L�B���J��TE�R�~��s�R�h��T
<�*8g*��K�J��K�3�bEWE�R�L�B���R��d*n����R�E��G^��3L����\����+�3L���z���=�TJ8�B��sI�Pq�g�J�/���O�S)��8�#/��3L����R��x�g�J	��y�r�0�NE�P��\R)V�g���a
E��=�TJ�(���/�S)����*8g*�
�L_��PPp^��R���*�K*����M=�
���=�TJ�(N����{���pQ��#/���R���*�K*�
��<�3L���|�g�J	��yqs�0�.�S=����a*%\��
��J�B=3o�S�Pp.�+�3�P�0������a*%\�z���{���pQ\���S)����#/^��R���*�K*�
��}�g�BA��t�0�.�C=���=�TJ�(.����{���pQ�K�3�b�z����P��\R)V�g���a
E��=�TJ�(N����{���pQ��#/���R�T��Qp��+�3�J����������b����*8���}�b�8�*8���s����xOUp���J	�"S�Pp.�+�3�J��������m�b����*8����P�q��:T�y��NU\�*�S�������V�0�
��J�B=�N�S((:���R�Eq�G^|�g�J	��y�p�0�.�[=����a*%��L�B���R�P�\�z�)�O�S)��8�#/��3L�����y�t�0�NE�P��\R)V�g���a
E��=�TJ�(v����{���pQ���w�S)��x�G^<�3L��S�)T(8�T���R��T��p�0�.�]=����a*%\�z���=�TJ�(n����{���p*2�
��J�B=3O�S((:���R�Eq�G^|�g�J	��y�p�0�.�[=����a*%���T�8�T����G��f~��J�y�8�_1�'��b�4�z��vq�T��gU���`�Q�(�x���\���E���*f*�#X.
/�����`�Q�(3���%���?
������`�Q�(���b�R8��G����K3���?
�wU�T*��Nr���f�LM�p��g2�
��gJ*�#]=�9V(�X=SR)�����B�����J�����+^��)�T����S���z��R8r�g2�
��gJ*�#K;<S�Px�vx��pdk�g�
/�/R�M��S����%C��K;<S�Px�vx��pdh�g�
/�/�,��L�B����%C�����S(�X;�dHq��SN�7���!�H���)^�^2$�j�pQx��U(����L�B�����!��N9���)�4�L�N��b�L��p��g2u
��gJ��#S=��S(�X=S2$��3�:����3%C�#��r*�Y=S2$i��L�B��R��	G�z&S�Px�z�dH8��B���b�L��pd�g2u
��gJ�Gpw:�Tx�vxI|pdh�gF
/�/��,����B����%�����Q(�X;�$>qw�SN�7k���G.����B��k���G�vxfD��b��������Q(�X;�$>���N�
��txM|p�]��.UxqW��]�wU�]^<T�W�C�P�OU������T�/U��3���r*�Y<Si�*�[^�U�mW�[�V�q��8\q��Ux�������������*n��i��
oV���G.�LfD��b�LI|pd�g2#
��gJ��#[;<�Px�vx�g��o��
o�/��\���������K>�#C;<�Pp���%���[;<�Px�vx�g��u��
o�/��4��LtB����%������N(�X;��382��3�	�k��|Gn����N(�X�*�|&���w���f�L�gp��g2�	��gJ>�#]=��N(�X=S����LtB�y�gJ>�#[=��N(�X=S��8�;�)����3%���K=��N(�X=S���LtB��������R�d�
/V��|G�z&�Px�z��3q7�SN�7�g^����\^��)w��>����$�/8����w����bV�Lt��TE�L|X�Tg	K�3�a���Ht��V�q������*��
�k�"�!	�"��Pxq&>�8U�������+�*"���
�����RE$:��R�L|X�U�N0��Pp.�+�3����z���{�����bW����g�I�(N����{�������y�t�p�p*2�	�����T�0�_�����C.�C=���=�}H�Eq�G^<�3��$\��
��!�B=��(��O(8����~�g��@��t�p�pQ��_��C.�K=�����>$����#/^��CNE&>��\2$T4��h�&>Ptn��C.�S=�����>$��x�G^<�3��$��L|B��dH�P��C=�����{�����bW����g�I�(N����{�������y�t�p�p*2�	��!�B=��(:��C.�C=���=�}H�Eq�G^<�3��$\�z���=�}H���}G��\2$V�g�S=�	���{������P���r�p�pQ\����!	��y�r�p�p*2C
��J�B=�O�3$(:���C.�S=�����>$��x�G^<�3��$L�����J�B<���]����+�*������s����*���s��s�8�*����T�9]�^�x�g2C
��J�B<������|����*���}��s��8U�*�C�����T�u��>U�y����*n�3�Pp.�+�3-��<|]����+�3�R�0C������>$����#/��C.�[=�����>$�Td*
�%�b�z�:�3L���|�g�I�(���/��!	�[=�����>$�Td*
�%�b�z���RPt>�3��$\��
��s�B=��z����{�������y�t�p�p*2�
����s��dI�u
QMk�j�����w��������3j@BW����p{���<fk�����N|��[����f��s�:g����J���3����8��tos��CRZg�#�m��wHJ��s�����Ii&RU�$X-�\t�����9CU�B���9C�!)-����Js����93O:g�*UVO6g�;$�q�9R���}��4�N�����Ew��Y:�3��s��\���l�J)����
5KV���1K"W��fM���Rt����oU��h���b��*�� .�&X�U)�c��R�*z�4�j�J�wPUJi&R�$T��RtG���*U��i��\��;zM,U���^����1hb�J��`5W���QKU��T�:	V����r��RE�:g*}�9CU)�q�5�j�J�:g�*U��s�����J)-����J�6g�*�4�
u�U�rG�s��RE:G*��������8���`s�f��i� .:G*=�����������bs��RJ3��P'�jQ��;t�����B�����J)-���	VsU���93N:g�
UVO6g�*�� n:G*�������L�*�I�ZT�������9CU�B���9CU)�q�9R���U���U�H�g�3T�R��T�:	V����sf����*T!Z}�9CU)�q�9R���U���y���*Ew���g�3T�*�g�3T�R��T�:	V����sf����*T!Z}�9CU)�q�9R���U���E�H�'�3T�RZ7�#�^l�PUJi&R�$X-�Rt�����9CU�B���9CU)�q�9R���U���e���*Ew��Y�3T�*��3T�R��T�:	V����sf�t�P���l�PUJiA�u�Tz�9CU)�q�9R���U��&"W�n�eU��P9�U���A���������8��`u?Z�8i�8i�<i���d��*�� ��&X�U)�c��U�W�N���*Ew����T����V_VK�7M�7M7M���,q�h�t����	VOK\:M\:M�:M�z�,q�5q�9CU��`��J�:g�N�U�
����U���I�H��3T�RZ�#��l�PUJiA�fM���Rt���~�9CU��`��J�:g�N�U�
����U���I�H��3T�RZW�#��m�PUJi&R�$X-�Rt�����s��P�h���U���Q�H�{�3T�RZg�#�m�PUJiA\u�Tz�9CU)��HU��`��J�:g����B���3T�RZG�#��m�PUJiA�u�Tz�9CU)�q�9R���U��f"U�N���*Ew���;�3T�*D�;�3T�RZ'�#�l�PUJiA\t�Tz�9CU)�q�9R���U��f�������*Ew��Y:�3��s���*u,O����.ZT���UU��1K��J��VO���*U�X4��V�����`��X"U��DZ=<	V��T�c�D�J��VO���*u���RR3��P7B�EU���i"U�NM��'�jQ�*w��HU�S���I�ZT���&RU���zx�U�r���T��i�B�V��T�C���J���9S����JI-����J�6g�*%� �:G*=�����������ls��RR3��P7���*u���������#��l�pUJjA�t�Tz�9�U)�q�9R���W���M�H��3\����\����U�r���J�z�9R���W���I�H��3\��Z�#��l�pUJjA�t�Tz�9�U)���U��jY�:��u�pU�����J�6g�*%� �:G*=�����������ls��RR3��P7���*U��9�U�S�:G*}�9�U)�q�9R���W���Y�H�G�3\��ZW�#��m�pUJj&r�F�ZV��:g�*ujZ=<�V���y��s��R��t�Tz�9�U)�q�9R���W���M�H��3\����\����U�r���J�z�9R���W���I�H��3\��Z�#��l�pUJjA�t�Tz�9�U)���U��jY�*w������i��$Z-�R����J�z�9R���W���U�H�g�3\�����
u�VU�r��Q�:u?h�����Q�Q�Q��GK'M'M�'M�z�,q�5q�5q�5��y��u��U���B�V��T�C���J��_5���j�T��Zi��$X-�R��E��&.M�z�X��i��i��i��Kg�[�����B�V��T�C�W�N=��tgs��RR��s�����JI-����JO6g�*%� n:G*�������L�*��`��J�;t�pU�����Jw6g�*%� R�$X-�R����J�z�9R���W��f"W�n�eU���s��R��u�T�bs��RR��s�����JI-����J�6g�*%� �:G*=������L�*��`��J�;t�pU�����J_l�pUJjAu�T��9�U)���P'�jQ�:�Xt�pU�����J/6g�*%5�
u#X-�R��3\�:��s�����JI-����J6g�*%� .:G*=�����������bs��RR3��P7���*U��9�U�S:G*�U��0c���4}�����4�{6)��
��%QV���/� Tz��a�D�/� Tz�D�J�+�3^jA����\�*W�M�M��������T�B=���Jw��U�r�z�K-��5��R�
���Z*=h"W���/� Tz�D�J�W�g��L���Y�*Wz�3�Y���Q�*WF�3�Y���Q�*Wf�3�Y���Q�*WV�3�Y���Q�:�P�&��P�R�*�>W���/� TZ���J�+�~��ZXk���R���'|0����O��!�W�g��L��~�E
�\�>���Z�'\����I?��Y-��~�E
�\Y�>���Z�'\����M?�����Z?�E3�z6)��J��	5�re�O�dVk��pQC*Wf��0���Z�_QC*WV�3�^-t��QC:�P�&��Pk�3��T��:gf�Z�h�3��T��:gf�Z�h�3��T��:gf�Z�h�3��T��:gf�Z�h�3��t^��MJ3��:gD
�\�u�,z����������T�P�w���:gD
�\Yt�,z���:gD
�\�t�,z���:gD
��B=��fB��.*>�����U��9Z?���S�,�	_�����.*>�����U��9Z?���s^��MJ3���	�re�O����j=�'\T|��Y?��^�s�~�E��\Y����=G�'\T|nW�g��Dp�z�e��\�M4k{�5�-q5����z��q��y�D��W�Y���Y��^�M\M�p�&��Pk�3��S���&���Zo��o�8n�h��*M}���+>�
�c�Z*�i��Y��i�Y��u�������4j�sFT|��A�Lo��j�sFT|��I�Lo��j�sFT|��M?��Y���~�E}��B=��fB��.�3����pQ��|��pQ�)W&�������Y?��>S���	����	���
�lR�	��O����+�~�EEg��~�E}�\�.*:���.�3����pQ��|��pQ�)WV�� *:��������W�g��L���Q�)Wz�3����Z�����+��Q��|�sF�g��Y����l��>|�����r��lRZ*�sF�g�+��Ii&�Z�����+��Q��|�sF�g��I����l��9#�3���sFTt6_����re�93�/r�sF�g�+��Ii&�Z�LU���MJB�u�p�������s���_����?���X������/���z������}���'.�$q��{�?�I��u������#u�|j1�eS~�����[pl���������{��R��M����!��O����tF������~������_�������3�������_���jY�o;��[?N���������������/�������
j�Sw_���#D���[(r�4r��eZ��u����5���eF#����2�| �������h�n������}{����K,�.�o�t���o/�����t����	��4�y/���q?_����)�d{e������
�E{��\W[��=��b�}1��x�^��X��o�m�����5���t
H3���������>t��"�Q�v�E�_��
�>Q�p��|��0�:��,oQ�8���_�c�~���~'+�G�����{�>y�%^��t��8��d������#����g�3��������R�p���w��� ��a��-���e�s�"c��l*p,�
)�L�a�������b���_������{,7�-���%c�3_s�=���������Z�~�~>��1�74#�vL&b�^������v�Qo��iD>�I}��]���OB���<�u�������#sy���D�C]�K�C�����g��x��^[��N�M��2��G��:=�t�oG�
���.��>��Z�C~��.d��y���YL7�� Y���`�?��i�;����2O;]����%�������!��-�-�eKv�Kv�yK�r��]#�����~Z��t��&�Q�M��+�y>�9�����q����h/c�q�����y�|M$��=�����#��WDb]�������������������-Sc��`�&�]��#1/��h���E�b��g�;�]r��g��5������=���������}��Z����EX?��a�F� DKb�",.
<���q}�&������EX\x�1�_��c�4[���Y0�d����M��>��2�qD8�
�V����1��}��v�y�8zl!h��u%��z�DlI�_�}�{xEZX�����������^�[����8����������>x�l�.7\�g�t����������%��q�d��A}�����{k��+w��]����U��;�},8��{��!��+�o�=�u��K5�b�6�_�W~���Xvb7�'�r���\vb7��I,��N�>�������'�.v�U�<B8�K>�A'^(�����Z������CZ14bj����%�`3��B,���,����y�M��W8y]����Q�}c�����)[��o���~�"0�2,��X��c����`�c�bz��#������C82�L�'�#�����dr�0�G;������Q�o�^E�������K��z�u���+�Ak�
�����^|L���/�	o�������c�]�y�UA~�8���d�k4��N&�F������zv?�J��l�mP������g��M����G��3h�:K���g"��r�Ep+��A7��f?�f������f�����s��e�Z`��^�O����y%6w\	�X����a%8b9�G,[�p�rR	�xT�#����~�����7J�8�"�B�u�1�x���������qy������v4�)F[�����L;��IJ}��h}��.���o���	'�&�Lo�xry����
3L~���z�}� �qX~[�qk`���S���H�C�q|�-��E;�_�G���H=��Q4�~�����At���hw���%ca��hD������h�^��E���?�?�r���WN����<^z����WNp��38�ly��$x�~�4��a�����
ly����7�e����K�[�Aq��:���n�Y�`�m1v
�
��<a��i�%7������5����;�����<hp���p4�
�#����*�XV�pDr@�����*��������Zo�N	�A_����_B{�����
�_��@/�c�=�K�[`�����8�(����x�s������~�kD#zH�|��C��9�w��m>7QS,�o77��W<��8���D���.���<^����o�=����`S�����h#�G~:j��Ml�>�-�
!�r������#���eK#�1GO�g[��?��u
[��_,d[����>7�-���c��)��@������M|K(,y��^G��fKv���0l�l�.y?�6;Ovq����
�!?�
�y�O�OZ
�+x�����I�~��g�p��
r�d�v�?s����bw�/v�a�d/��������s�c�����>��S��:�U�}:��}C��8B2z�qw�:������z�d"{�����)=7v��s?�)���g��^eV�F��'95v��SK�Sc��
(I���ne�c���75������+���D�=�;^�o;���������$���T_p�R}�K�H<�/8"U_�Rg�7V�;5zPL�F������m=2�K�G5i����~0-�g���E�A��E�g��/�����0�q����q��f����������=(�5c��m��q����cE�fl���qQ`�m�5���EN�f�����c"�+`.$�[2�}���c����Y.���c�Z3����f��������ccZ3���f�#j���������{��[3�"��P��o��q�����&wouz��`1�}2�5�G��f�����-r�5��` Z3�cF�f�����~8�=2�5#����r��r���\�d�����k��n��c��=.�5���f�FB����fDFBT���F�F������5�������g|x2�CF�f������m�O��a1�}r�5��M/���E�fl���qQ�aw����>��1v����`.�V]n��M�{�����v%ke���#cZ3�dDk�69[W�����>�����5ck����-t�5�GF�f��������u��|kF��o���<�_�-�Z3�XDk�9���'CZ3zhHk�8���CZ3z`HkF,��Z����pb,���e'v��������1�N<{_>�?������>���C#Z3�������V
�����G����f
/��F�u'��+�L��xr9E
N��0�c^�&"�Z����f��+C�rz��1,=�c(XN��`vd0��M:�#����h2;2�L��&�#G��y�xFe[3zTTkF������1��1�
���pkF|LxK/>&���������~
'�_����xr�k8��:L~��{������BZ3��������CZ3�`@kF���fl�"���N.0���/0{\DkF0��}n~O���������qx��G�S�tkF�;�G,[����X�K�#��b8b9�H<�%�ikX�@���g�o�<,�5cc�����=2�5c+����hd[3�����v4����H�fD�C�a:d�&��o�h2�a�����&�f���=�N�|kF�j�����ccZ3zdLkF��i�������|k�v4���A#Z3���WN����<^~�t��WNl��36����^�&1������-o�3��9P���-�����h���h�����(.<���
��4a����5Ln�W�]����;b��������{�|������-���d�8����#��=�lF,K{@����#���q��3�;���W��D{<^���G����z����&�[c;�-$�����d~kD?��=h1�q����L{p2���d�'����&?`����!���c>�0^��iU������8� ����G=9���}���H��>|$xS���	,����B�Mf����5���m�wH���Mq��������9�����k�*v�a��g��mp���mpX0;,����B�Mf'�������N�uN_���y��x�@z������)��y���a����� ��O/�Ax6���B���43\��<&���d23<��Y����
����R��-���d`8����#��t�F,K�@����#�E�q��3�[o���<���P��L�!�:zQd���c����R::bJ�/����K��h��8b)1Gp��9�LK�p2�>�d�}����&?`�K��6����>-��Nl�
r���rn�\�vb+������*�z`Qf��B��
:
<��A��;X�	��l�p2Ye����m�.����}	�/v��F��&;s��U'h��5���e��\u�A�8lf�(3s� 3C����d2�(��V�����X��3{;Y=x3]���<���>/H���2M�$��K�2�:��A��O�48*���o��h����/�sxrys����~�tP<��9��w��
���[�����;���4����F_���"��8��0��L�0��E�0�GM�0�GM�0�3��A.�&���dva4�]N&F�����f����8�1����.�.����.3�1)t��~���5������] ��� <�M+������p29c�����N�l�;�+q�#-Y���{4{>�K~9j;��P~�w���!&�4�$�w�f4�v���n4~����"�'����/��p�?�(��4����F�f6dr�&�3J��`E����Q�}k�>}:�����>��IMr��n�az�dq��v�GF��L\���Qn
$+n��g)�l�m��E�6�d�m���,��,��6Hg�����LG��N�1�'@�"��.����l;�O��m'����Uv��
F�g(W
2h�����.w?0mqn R�����g�e����q�#Q�����C��iCv���g��B�1]���=�u@��Nt�cCvc���f���%�0dw��}Fb�� G�J`C��B��}�L���k�`�k�?��OS9���Mt����x��go�}���K=����p���o���_,�.r.���7*�E���U��Rt������}AL���O���?�g� &�E���Dp���@�
H~�N�U8�&�h2�u���[��w�����]��Q>~]�n�d���P�c��,��W�#"����d�h����GD85<"����X�u�
Y�E����dvj8��Mf����;u�>����s��4��R>o'h��	@b)����	H��
 ��c���{_8�?j`�[vy��E~Mo���>s�D#�C��	���q��h�:a{��/?!g������63���in'��O.s8��6a��<z>���`_2S�K�G~:��%�;t���B~��%Y~�l��#�LXf?�8K��_8?�Crs:.����B�Mf����2���m��;�?�����h�q��
v))~�0�� ����$�|��� *>�Ab��`��0A��&����_Q���j��?+���"0�B�M�\hrF$/��:�H�X0"9 �H�X8bY�����8�~�]��9�T��"���dHS5���e	M�U�v4��X�1�+���X�gC�b�����x����'��O.S8�V����y��8'����\-^�j��M�\r�����Nyh����V&?�H��	�3�Y�g�&D�P?v��c4�]N&��������\�*���PDS50�\��f��U�����b�G���*NL����`AN��dh2;�LN%?�dK��6�T������L+-�2���2h�{s3��,���A��� ���� ��a��Mf3�������l���f�����R:�H�#�K�8bYJ���`���$�K�8"[�G�O�����f���PHS5�i�����M����):����/����st4�R::<G���9�LK�p2�>��2���i)=L~���z�i����4��� g7�z��fPt�)7��;7a��(3G�fA�0v1a4��N&���0���c�m��BM���dS5��^u�A�8�f� �U'l��5���3�EA������f'��E��w�����4=?s��|��5.zhDS5��i��F���+��`�Wgp0�l
�d�!6��
�s~�!|M�7g4�N&E���9J��Tm�x�����yX@S5h��a!M��a!� ����A|����:�va?�.'�����xrqa8�\8L�?C�z�y@��,��[��=f��O�@����5������] ���q����s�dvF8��1J~��Fo'q������N�h��#�g�����_����yI�h��y�����;~+�����F��h������K~+��{����X���K4�M�A~��(z�<��������n��c�Y�M/Z��'7[ o�5�#%l�A~��-*sMX�7��-����,��� l�l��K�����CV�1[e�������t[5�|��G�c��j��!m�<t�����F0���s[5t0�&4���`|��%��j�`�Bf#����;��Et���g���������u�G���(��)��0�tk5�i���!��<4���cDk�`��`�)�CF�Vk������`S����j`.Y�`�%�#��V����j�H<`����>�Z���Z��#��V����Y��
>l>ln���I����|L��|LDk�VP���gZh
<�L'�W�����'�	4��~%?����=A���O�4Wk�_��s*�ylLs��"<������7WCGDx5<"��?yF�[[����pjw��F����dvj8��Mf���p���X"�\���N��K	
H<O���I8b)����$
 �<IG�Y�@�l�����X@s�9�\�#c������l�MV�y~��X)l�9��c��>K1�q���
��s4��6p2�m�d��D���y�� ���<���V���%�#��jA=�:9�]~��,��j�8�%�q�J�/�����C�@���E�1�L��'��0��k�����������1�3��
��/>�Ab��`��
r��#����_��r� �B����d����?]��_�;�
endstream
endobj
16
0
obj
35295
endobj
17
0
obj
[
]
endobj
10
0
obj
<<
/CA
0.14901961
/ca
0.14901961
>>
endobj
7
0
obj
<<
/Font
<<
/Font1
11
0
R
/Font2
12
0
R
>>
/Pattern
<<
>>
/XObject
<<
>>
/ExtGState
<<
/Alpha0
10
0
R
>>
/ProcSet
[
/PDF
/Text
/ImageB
/ImageC
/ImageI
]
>>
endobj
15
0
obj
<<
/Font
<<
/Font1
11
0
R
/Font2
12
0
R
>>
/Pattern
<<
>>
/XObject
<<
>>
/ExtGState
<<
/Alpha0
10
0
R
>>
/ProcSet
[
/PDF
/Text
/ImageB
/ImageC
/ImageI
]
>>
endobj
11
0
obj
<<
/Type
/Font
/Subtype
/Type0
/BaseFont
/MUFUZY+Arial-BoldMT
/Encoding
/Identity-H
/DescendantFonts
[
18
0
R
]
/ToUnicode
19
0
R
>>
endobj
12
0
obj
<<
/Type
/Font
/Subtype
/Type0
/BaseFont
/MUFUZY+ArialMT
/Encoding
/Identity-H
/DescendantFonts
[
22
0
R
]
/ToUnicode
23
0
R
>>
endobj
19
0
obj
<<
/Filter
/FlateDecode
/Length
26
0
R
>>
stream
x�e��j�0��z
�C�+i�JJ��.����8��������L�����>����I��SmM���T�w�j�p�
xcY�smT8�U/K�s3���v+K�|D���W�zh��%o^�7��W_�&rst�z�������.z��U��t[�:�M����O�99�9rF��A���/�X��S��9����W�����0
��li���M/��&����g���~V�eE^1R#i��@�fD[�
�=� �D�S2����V��H������)<,�����wGy��-���DP/�&��"���y).�TG��qsp|�����r���^��o���
endstream
endobj
21
0
obj
<<
/Filter
/FlateDecode
/Length
27
0
R
>>
stream
x���	x�E�6|�������t:����
$�@�D��{"F�+��2�����.���J� ��1.�:�8���82���2��H�����c������}�w�����S��S�N���#"-'��g,^�~�}o#�N"�����y���]EK����s.����7BD��D��5}����<���<� ������a��������;�L'�
�<��3����_�u!�s��~�����b��P>0��Y��u��
������,�M��PQ�`����"OP�)��!�i���������t��e�~4�T�#�J�t��D���)�<4�F1eBt5�#�8�	�H������e���_K�����?TF�4�'�,�D������V���x����&~�Fn���w�o�j*]��*����OE�S/�Z]���<F��n�G�����C�y(�f�]*���F�B�MIA:���[Y��B7��av����DK�h2�GKh5m�����i����K��N)T�>��O�6�?���C�o�T�I/`���M��n��F��wE��4z�Y���V�]����=�G�����D���t9=E/���K�,��F���G����g��|��:��h��E�����]����l�F���2��l��}��|&U�C����2�!�;��!�����?�+�*�P	�cg���-�.v�������^�~��k����c�_��|t
]L� �������i}I_�Q�d��v��sn�9|,��o��G�1�u�S�u�z�����v���4�9�1rC���k����Aw�P��D�x����Q�[��'��f���������ad��O1J�?9|0�A��� ���
�F��*~����;�3���)9�@e�r�VZ�����S-P�������j3S���M���hOk��J}�>O��t��
���{��#B�9�p��k�&]I�M�B��c^�D���#��B����`�l4��Nc��el�����`��G1��������t>�_�W�k�v���/�7�~~=OWr���O���LU��*K�+ ���������G���!�Z��C]�^���nR���i�h���^�I�M{M;�������b�,�A�=�nh�3�2�a��<�e�^�y�:}x�`������!$d3��1��aV�WT�D0/I"}K�j���
5��l7
`�e:W`��������ikb�&�<�%�-�F����
����O�w*�>`����Fv6[@[�!v����e��(�T�����F�������t:���U�������C��S+��}��e�1�E?�uS`����\
}����k�:[���r��*mg:�x�>D���w���5����\�n��hy�VV=�u7�N���Z��"vV������N��t)��u�p�����������=�z�clVD+8*���������~}������6��yY>+�z8�-��i�����W�~��t4�=h�#8�^�O�f��dPo*C���toP��a�G��f�`���G��\�������a������q��������r���1��������E�a�Il_���t3�V��w����~��]�a�Q�74�f���T��avP,k��'�;�9i(�a���	+4���B{�q���*O`��"}v�L:��G/�1�vJcci@d<��:S�0����m|Vt��$r�LaNu������hT
9�r�	����/�WR��O�P��E��y�9���GvV�/���IKMq���I��j1�tMU8���sG4�Ma� w��>"�;	�;%4�H��2�@�,�qI%gw)i�J%�3PI�}z������Z������&�!>$��ex�;���S�������sVo�Au�l�a��fY���mV�6��������!Lx���q2;���/�fx8#�F� ���>3\7�~xMf0���w�
;#wF�r���C�
����aa�l&0W�����n[}u��f4��3sgN?�>�Lom�Bh�&�~�A�QT�V��sn��z�wn@DW�^������

��<D��h�j�vB��+���J4#���oV�p��tV l��;g�YM���0��(���;��7<�zb}n0\���0�&k[*�QK���qN������`�%%�vG����<��E�v|�d��Q�((D8pF=������Y�h��P���L����eX�j�	"]���|gn`��
�=���S��S�|��$�BO:T
��p8
��%T�4s�>��}z/n�s�9 �A��N(���A1�kZ
��Hx���X<@32��(5�y��iK��M9�9�M����$�������_���2|�	a����Y���	���N�_��m���b��:���p��z%��C<S��P��:
�H�=����.�zfX�R�v6��}7X��_�i5�;1�F.I~`��2|B����?���w��
�������^m�Q����G�F�nZ=�5�|Fn���z'��7��7�)1���]k2�#�n� ������n�e+�m3��	���t�$�rb}3g|X���my���	W����#U�"F�����2+s�A�\��2A��he$���4Fg��X�S���n��|
?�
L4t;g��������BV�a�a��W������7�<Z�^9�y�rt{%U!�<��~%AW���/�x@i;nh�=�6�E���IZ���`�^h]i��6�6[6%=ny�b��j�4�&��t������7W�
}�e�c���pl���_���<�x��M���������m3���2�F���\���
��7���,N	��������O{C���Cc�G��>DU�B����Fjld���������Y>0=G7�.����t`�@�������p��%����w�:Ov���1����=2-���f�k��;n��z��|���y|�T5��^��2��d�r6����
W[Y��i�Z����-#�'�z���F����WjXU��.�	>�Hc�h1����#��*��Pc0��M���W
"���y��Ts�
���xD�o��d�*c�c����/��{�z�f����e��=9��&�fuc�Y��H/��T�y�e����yy>�S�NVWz���3����Y���n�vf��e/��"{]����l1/����l��'g;1�<@%p{BV��Z�=-���[�d#�o/�s��;�����������O��\�3z$&�������NLb{h�A�`�q~��]�\�C�0;);����}hN��HrV�fg�fv��*B���M��&�6KfF&�La��GE��� 4�v\���M,��4hP��}q����
r�,����������I�/tn���f5��1G>�`��������#GO�0-������mUS&�>��K�>~��G�h�Q}��@��h��L�ROz�(]��b�$kM��<�mJ����v���}'��Ie�x�I�A��T���	�N������f8�:���<�������"���1Sc�cN���)E������a���e�������'����i��<��y��/a�)�W�b:_N�@WEqc|B����?��0+�"R�P!>jL��kKJ�T�����T��&�(7'o4s:.7e����/��~����H{���\���g������y>r�]�s����]���)'��l�}���W�����{�|*r��'W��\���w�
{�Qa�g�C�	������C��5_-r�tLulr<�x�aa�Lv�a��6���v���=j�5UQ���U�����L��������i��� U�����Z+���p;M�MO��/��/���H��Na#��>8�y�q4��X�U0����B��
)Cw�
�oH���lrrrBu��O����b��QLi!������?-��\�/m����;"�#[Y�Q����y��`_Gl������F����������Ge���;���I�+�t���t�1�Rp�\�|��Ir�&%9��Tq�b0�������z=.�+�ao�������B�S���R=e���T#UIme[�T���,v�bg�s�Sq��N�VJrr����:�MgF:K���ZY�p;��={�������V�d'���P��PR��)�B��8�mD�:�U����0r5
�_J�P�yS���xt��M����:�Q��>}�n��]y9J���i��]�b��A�}����#��z�����.N��{A�)��0�e[d^a�%c����P����I;\O���^Mq�i]5��=���8���v��`W���vf`��{@D���`q�����,����D-����m��������C�I��������vf��{��3����GbF�H#�1�BR|-C��V �.���K(�����jr��u���K�s����D�c��k����Y:n��y��S'��X���w�#o�3r�5�U�6s��OV�t���W.��N���BKmt�NR��~��2��g;Ad�M�m������-�5hc6�D~[����lcm�M����o �=�8�L5��f,��b���d$��
S|8=��*�N���.�Y�!�:,1�P�+��s�t���(O-9�{����s�=L��������Y	��;�����j:��KyT������e��o�sx��|�_����E���W8��eH��#��y�fxEiK
W����{��!�C��=6n��P�N��=���\�rU��B���-�TV�[
YD8_�+�^�wl_�w�[Y���x��e-���XV��.ZdA��"����k
��Q�36������5�~������,�W��_Y���������t���CN��5������w��
���e�����jnN�HJ��BJ�0y����Gg5]�,�����k��ic������[�0���
g�R����t��o�?{}�~{.y92�w��<��M<}������O\x������Z��VcvR��q�Q?�U��e��������yc����h��XR�k{��%W[��,e���1�5>�)E���xnK��n�6'���?����-L�`a
!;��,����w-���sClqv��3c����?������:�wZ�G:<�_X�	�-X9�P��)hnN��L�rmdN���S.v��������wY���_<�?���c���<�M�1�n��K���w�+�)�(r^������7_r��W��B7�<1C�E)��z�~�jI/6'��^�GU�L��E����?ml/I�����Rss��)�@j>#�Y8O_�s���p+�W��[����`I_�o]�����.�������@�(tjN�)%pmZ���>�&$��v8�����b��<�Hu�.bZtysvE�p}�,��"���$���mBm8�m�|?��K{����p�� ����B;rW0)���������?��u��,>9�A����~��)SjO���f��:���%mW�i�M;��P��e3���0s���#�e�I5�,Z����{�������uDP�]d�^�fS
�6x"L�K�	e��	��,��-qj������-f�������5�g�N����{�K��L>G�e9���_��o�l}���z�r��Y�����>gy��W�_}����#�����S�c��B���j�r���u�To���R����.���^��Xj�S�S,�V��Z�T�OP�,��UI&�<��5���t�)�����X-��d*����839n�3;�l�K�2��(3I�e6���;
����X~�d�VU�w����~#+>�|��H��9���V��b)��r�f��*A�j����j�XLf���h���� ���S�4�N�'L,�JM�i����X�Yx���y+d�q�6P��R�PiT�����#X�!g�9+}�����+}^�&!$�
~
~�*�[������}����V�L�
�����H�(?�/�����8�3q g����nfe&�'r(�N���?�]�����F��}�TpT�c��:����7��|�,-��;9sd����9�uYf���R0;����
��������}/d�u����3<�z����%�*�QLN�?Y���g���s�v���eyFN�2����;���F��YIRr���L� ���e����Y2��L~tmR��rU�L'����`+_��j�;����@����%z��a��z�+0��9���v�G1FN���e��	k�����=������ll���������s������5�q6
�b�s���'����4���.����l6��P,����gf�M����C
��B0-J����2f�����b���a�K�����b�	�<��	S#��<v�Y��_^��Z�d�G>�9M����b�������k'��~�W^_s���N���76�z<�z���d�g�G�h^y�5�Y���q�G��.�V�\ZP�7R�m��6d�?�q��
��������K�e��o�?�|��W���O�O���|����2(eP����=#�
��v�	�rO�WY�-I^�]��*�A�&�N��%I�Rf����S���;DJF�2I��9v1���C��F���r��j�
GV ��D*R�C��X��LS05�W_���8�P�����#�1k�GB!Pqa2�����}�\��V��,���s/]vv��4�:��'��������K'L�n�wN=��O��=��o�����]*e��v�.�[�m��7;�6Yw[v;Z}fs*�O�GX��x��C��{����M�~���o����4Z�f$�����L{5MI���Q%iR:(���''�����x�����##���w�+��@��9=c4�'F�Y��Xb�����vc���6�W��<����8-&���z��c}�Gr�l8�����2nB�^�V#�k�Vy��������+�[�����F'P�-:�B����9QKO�m�@���,��	�X�Cd�:X�����j��'�R�h4I4�d@X$+-�������_x��qRfb��7�����R��	����1��O�F>�r.K}�s���r����*N>����������u�03E��<q�������
�@����8�IZ�b6���${E6w���]�X�O
�\�������b�a|�25�!�!r���������Mqv��.����Q��-��}���n�v�fw�z��I���>{�G��	
xLj���$�R���E�b
�����b�`I���p���o�'
�d�#��f�f��z�
|�V���dd�|k��~pX[
+���3J�+;y�����9g�Aqj:���#����M$�9�x��^&����Yx]��]a2;a��b������sS����svhn���@���'=a��� &0}@����s0����jsv�����SK���tcO�q93
��{m����_�t��+���|D���AO����x����������I{v�U���$~�Cw�s����V�4:
^����IJ����J��4�O�	NK�^h��o�>i}����m�u���p��k�O1/���w�C�q�k]cc��_�^��������n��l��n=���^z/S/s��V����,Vxq�����MSuq�o��MV�j����\���+�8��rp[[N�O�T[uI|S�qN^���3��$����a�����21WJ��fK��B��%��������gr�bl;5r�2�Z����^e�#�k?���*�0��n�����;��d�tY��N����������4���-�6 A��lJ��g0��2�������b���/70x��F��c�k�C�
�)%=�T�����2s���\���S��fg�LI�n�14�
<d�����[g�X���W������y{�^���
��{6;viB����c=f���/,���0���	�?���z��`�U&�=
#����Bm�|�7�b��P�#��y����~���Y���h����;��s�%����Ns;F��>}�1�c�����o�����wgh����2��������j���j������������������2�����9����J�����������RUSj����9g�� �hc�T>F��I�S�\�l�l�\���"e�)e��u���-���"��n��_�,����G�Y�7�w�M7�w��<�j���<���"�\}���p�����������:)�n7�J���eJ��"�,�F��R��]�e�>�������m�:j�.��H��Qi�T1*���LNJ�}�z���II�N'��6��������8qz��D:&�3h���/a�f���s1����Y�2�`&v����w�������G���_;S��v����:��������D�#G#o�������e�]-��]�M��g`�.����rw%/s��Vf��k5�'g���Y�9-��Ak�NqLNiHo�M��h��������T���2�T[�B������AsFwO�T.W���k�������N�?�e���q���Zg��M��1;`.]���%�q��eT���[��)mg�Q����+g�^u����G����&�\������w�l�o����w<*N�+p,����Qt��,Il�6[[�)����9I�����l���Z{����c�����=M&�p���"�8-%�y��[�^������[�{���I8@
�|9��8�pU�dY�[���������[%W2�wEiL�Sm8}Bmx��+(� ���t!�$���mkz��5M
SN:q��b����k|��zs�_c	V�c���6�t��k.Lw�����5����zYL�#R�{�cg���r�u��{:&9f9n�������n��5�j
���Y���"�����,�����c�G���r�
���r��3�V�e	z�����\S^��{�������\�ke��nO���������lm����z=�+�[Oz�`n������=�]��7�����|��]���-����u�YIoV����,�r���.]YK��(���|��V!�����+nC�=�8���=��1�yXA�����D��>��M?��n:W}�^����"�4��#�lu>��b�j����H�q~�N���,��`k�����y"~����g�d��D�l�3"�V��9�����s�U�Q�����kI�S��������	jd�����%uLmb��a����5� !J�xP�1c4�l�zX�bU{���7�@������tq�N7�{��&���!VM��3�%�X�/������U�9�#�������0�F����4����E_4,6wUr� ��w8*���
l�W@B�n�UH7�����Q<!�y6�0~����\��Tx��I�8��0���3�-�OMyx���?x����7�i����
�S
�G�x����O*�*���j�L�m��k��2������5������K����	j<�;��)�3]7k�E��+y�����>��da]��C���T�EOI-HK#a�<���(t�W����0��ff���y���.�cp�<����*OEPs�s��|
����yA/��~���7��7D�f
�� k��ma-D� ����1���YI�V�f�$���4�)�������!%���*�
}�2R�U�U��pcI����dSUM����=�|�GK�dX���\*R�>�"k���kC,#�$~�6�4���.T�hZ.�.�����
m�e�u��-zK�����e�g��>Uj-�Z�������������G�}bKx�������2�:�!`3D��cE�T�[�*}��M��������M����-��z�Q��=.�N��vE��L�n6�4MU9g���#kqRN���>s����Qt~.����=�2��d����e�n�y��}��1�g�|��q3�����SKE�1�������H���6�Q�a~����(����v�rG��4�D�@b�u(v� ���,�(�!f��g��^fi�-�/�����������O�n$��9�������4�",������w������A���N(�QXu��t'�]M�*���������n[��8��i%iMi��4&/#r
�5�Y=�� Y�B1�e�6��.c�qs�
2����A�i�>�!v(	�n��7�y���C���v�)�PcqL�,�6���)I������a'����6�N�=W��6����6���I8��8S2���Vi"�6#��!&nS�O�P^%�|�-����?l���qc2��qzD���<�w6�81��w���2s���	�;����U�5�9]eV�/3[�7�[+/��d=�^���\6�/s��U���P�Uk�z�i7_Hf��������~�4\�N'Y����-
����,��b��L�o����(��&;��%�F��d=������n��?-�����:L�tc�f�~aw�cwv]xIbw.Qc����m)G*�.����>(��
��'N�,��H�x�X2YK@�&��Kqq[0$�{<���
������"��6�\7i�i�4�0-�];��_]d�r^�����X�������?y>���7$�q�R/���<gZr����|[���+��z�n��c���;��'m*���4$2��9���c;)���q4��$^�fjU^�y�r#L���=O����4��
S�)K}�&��"����+��(?���
����8&��y��:��z$}��2�B'�-n�7j��&���W�����F�������t��"�A��T�O����������2!�nG��_/�Z��������G!�$����ze�}�#�*��J�!��H���B>��|~��A��~Y@�@�J��2��D
��A����&9n��cL�����c��_g�Ob\��{;��+��P��_����_���KD�K� ��4o?�������L���g���nE���X@L���W�`�����t����R>��|z>�C~��
0u>#�a��C�sP��������_L�I�����z�=.V�z02��@���c��a�#���S+�6�{l^i���.&���C��?2}xxJ�!�gq��6��7G�M|�+�uB��&�B�A�V��J}����!tC{^������ZX_3��*���iq������b�$��%t&A�N�-��1N�S���Z�}��B�T�;�{��J�����>9�[��%����5������Nc�_#���C�:t1A�������s�~��3��Mc�?��"��1��H�x�}�a!kn���������U��������[ �}t���>���c��%��F�Em_*�?�]��by�
t��O��;�oj[h6��j��Q��z�&L�� ��Ho���!v��l�j�DN��p�j�	�A�jU�id@N�H���$��:��<����|]eJ�\��F�������AGw���\W]J���v�Bg���@3��v�����'��q$p���}��l4pML_��:��E����~v��^]���U/�R���.��S������}6N�Ha����(��v��	�����B���uO�(F{�vd7��#X���Gw�����������MgG_��G���{v��m1[&�Sb/r�bb�
hv���.��}��N�v���������Ob_�P���D}���9�{�}OV����	4F�Du1�H�M�6e���~E���?z�t-��<�Wd���<���J�@���Ut�����\�q�����c�����.A�����cs����wH}�-�cb|����)�(�.
��q��E���MR�!Q����7H�����R��n7�>}M>l�lkM1R��������tl��R��I�8U�a
}��%���F�gt��J)�]#����	��&J�3��t�������w�����}���M��N��e��m#}��O�>eD���bj#��@�(#� �?��|���@+aK����}�8�Nc�_2�X
\�Ls�(N}7��"����Neb-�a��A��&����?|J��bZ���������"���"��*���g�f�rY��}�c�S��F3�f��D�7A��������NG=q����P��������l���G@��J%_'��& �|o�>��Q�� ��p����v�3�������|����r�;���q��l�o�o�����.�uD��W���@��K�n�>����;�=���J��h�8�?�v�"��	������\zg�Fwu�k�T!�{����q�,7����Dw	(��*�'QO��z���>|]�Z������I��>��)�$G�����7��N4 (�W�?�O����wp����+u�c����3l��D�����8��<�&Wb���R�w�?��@!���W
$�]�����wng$� S)�?P��~`�B�z�4���/a"�AN#����{?��N*����R�@G|"�e �\�z����x�����t�k����&x%�����w����,��A�����v^�]�m���-��2]�F�/����v^���?���tp|���o�����s��t
Q�J���":>
v{��G�6	��^���n����!�
�`��I������x�7������c�v���~p��e�?z#��(_(|�����!^@��������O��8������X,���9����_8��4v@���D��!�m�����]����-�8K������%�Ng�_=�$(���8�_�����)M���/+}n�?�����I���%���_��,�W�;�9�U�g�"���_�}��m��i��v����_�������~
�r�@�����/c�J��}���~
�
����,�=-a[bc������{�cO��.H����k~q9]�����w����_��;����xb�O�2�JL�m��/���V�����4�����&�+�2��/I���'�?���?��zK����O�u:T=?�Vb�&��uw��x\_F5��eR�HO����+a������f>N��G�������6'����W���������J�M�+�l}
���]�V���?��d��a.�P10p��s;��X�eyA��+���������_����w�8�!��x2lq���z�{
y�n���}���i�w|�v�E�9M�-�Mc`�g�L|��~�a�������(�n�����Lq7d�B<��>���l�b��&���C>�����D�3�����=X���}�������r�I�t�|���^@E�9�T���W����d����M�����F��t���L��LYt�z�`=�n5�;�,�N�W�}����������t�)���'���Z��s�	>�0�f|��P���W}�S����}g�������f������n�������d�gkv�c�}�2���D[�K�/�B	�$~G%��]W���&J���N�|�wf�
���i<�}��|&y��g���E?��s�L�le�p�*�[�*�$������c)�t�|f����o��c��CV���D?T��-��/�lw���~f�u3S��j���������!�y���q,����=I��>��W�X9Fq7�z�(w�������*�hZ}�
���� �~+p������1�����V�Q��6�WFG�R�9l�g���p���"�F�G<��J��ec�rz	��,N����2��8��cc'�\�������9D����X�'t��3#���)���1�+�+hqW ]����������
���~�R�_��/�t����K��v�s��]������$���@z���cLW }L�~�>m���1�b��T���A�E����x��x�9?@|��8jq��~���D[�������=�/AO��%x#�cmK���l���q�?t������{��d��GdW�����2;owS���t����(6F���D������bl~��{g��=1�)�����v�qN�f}��+����g�Z�^m�D�
[+m�\J��W-�?��E�;�Q+)[��:��o6\�'a��y�����L�<�#o �O��z��y�6�s��}�����P��q��h�{u���'R�~�t�|�?��g�Z?]�_"�<�i�g����L��|7]m��|�F`��HO�m�+�Q�7��|�L/#�
��|�+�3
 q�S�h�������+�]��51��>���.���{R&/�L�Po���=�3r��XW���d���VZ8�f�&�����T��}�>������]Ni����6�z.�0N��H�>@��nW_@]wQ@>���t�D�y�gt�xW��_���:���AG��*��N������S�FS��1�O�]h�O�9�x��zN���:�t1��Gh��,
���O?���}0=@�?3��n
M����va�G�a�^]���{r|��x8n;�����DZ�������sc�"/�,n�"V���]�|��U�w4q���m���V�;��f����&����[���whb
�w�~�W*�N��>�����N��]i�y��a�s�����8}Y������������,�����Y�������������j��d~���ww�y=�T�������\���]����^?�q?6)v�H��ws~�p}�R��3���;=?��� 2=����T�c�+���o-��K��3�T�y]����?����Ma�/�oI�.��z]g���:y>������w�����$�+nz����4�
���	��y��Wb/�U���[0_�v��1�sL��rO�1!��#9_�>'����?�G�y�����%�������������wDk���s��c������Z��1F�4%�.����q|�~�M��8�������-: �v�@;]���8����7�v���~���w�:��_���c�}
:���f�?��,yF��x�,�n�;����/hy��5�N& �zs�M�?O���|�	q,�������Dh
����v��7�
��+�x1��b�_���3�w$E�C��������J����-��b�|K��_�H��Y�0����d�������38���� ��d%�K�O�$���������@cB.���;�$>A9�^
 �����4N��q��~M��_��������>����uP�E���#l�r/����l�:)U����G��a��s@�"\z10x8�~	�j��KCXgiuG�*�s�<[��@H�/��T��D�H�
��C��O�
��J�'h��*���0��
�c�wy��e</Qf�e�C4�z+|�`Ut�V��>��D��P�`&#�<�x���j�����?y/ ��<N��B��Ju]�eP-��z2��z*���_O���"�l7G�O,�%V�������2���������;���-���b����)��g�{������Xk	?��D����/yM������.�����wB=ih��(h�P�&�6t��
*�i��E���x�*Z.�U('���$�H`�4.��Q����?����������>����E��n�o������L�����.�o���7����G�>����l���� ,]�"�m�R�.���f����{���D3��
���~+�oE�>�����U��������w�-�ReW�Ya��{�$�,Gu������B����M���^#�i�Y%tag�
b��
��3�^r!���]a���Z���?*f��
R��g���l������u{����l?
(���,��D��A�9��bo����������?����{��Q��������1���-�e^�����p����_x�<���?w���������;���N��e�;�29_������s�B���M��!�����$���>-������e�qq���_#�w���9�,"~�K��<q�*��n�>�R�������-q�/ �!y����2�(��S������h�������������7qD��3�����H��R'�h����@@</�%n�C�w����|���R�j���tp���F�����6��>���]�}1�L�!mF�C~�Y��1d`�|c��DY8�ec���l�!��1��*�nW7���nt���F7���nt���F7���nt���F7���nt���F7���nt���F7���nt���F7����K��_��/����N��T,�g���?M�m�W;��i+�L|�
�B��p��Qj���S%m��JwF�8��L�sc��=��F����y�H��b��J�p������6����}`+8%�Cc���z�I@G����@P��{�G�Q��(�:Uy�3��*�����"���W��X����$W�r�����[�W����z 
(���{�+�*�4;��j�r7-�r;%3F~�~k�S������R����Du��2���j��u�Q���O?)��kR�����k��5hr����(��%�#���9�%�.i.)�Z���:H�Bb�,�<�%�������3�����4Z�����^�W)i�����JAke�b���b�,j.�U�S��H���2P�bj.�v+����M�oe�3��	�J�D�(�����O(V��U�db��Q����L�0'B,~��A�����kFE�.e��E���dS������{h�]-Y����
��zQ)�S�!-����j�2�a�ZL����u-�J��@)��C��Z&�~5B�1k�1S�1S�����>RV!g�+�<e	��#,�*��)yE�;���8wC���K������"�y[�I�UO(��P��,lI����[�%�����)�5C]�P�cSF���'�,B&[����W���'�_�{����|��n�*��������c4�����������l��#��n����m�*z���;�
t?�3Aw����|���[[@��;�1X�Ls�8��������SZ����OQ��+h�S��r@��������c|
����=B���|
miN]7�����h3�bu��=�Q��|(�Hs������w�>������j+����#(���Jn~os��d]���'_���r#��clTJ�K��lT��>����@��_��c��5�.������jV��������� CM��'C�ogG�a��W�X�����2`9�{R�}1p	�;�R��X,�5��y���y�c8��c8�I�y��E��hG8���$9����&p4I��&p4I�:p���u��u��G���G8�$�8�a���Cr�0�aH�p���%��%�(G��(G	8J$Gp��p�G@r�G@r8���N���Np8%�S��"@p�p��q�q$�p��d�����`���`�+Y��e/X��e�d���`��B)�Y
,���
�m�mo��m����ap��GXr��Ga�Ga����$�pl�pl���.���������z3�Z����t}.�R�/���M���FI/��$���%]B��>I������������4�|`=�x0�����@�0r�d�X�z�V��&m����'�c���V�I]���y�:�;��i���{������J��x�-����2^f����^�������bk{�j?��������������
��2]���ts�@+�#=����6`#pP�}�|�/�z�|���rP��h�<"r���N�`[�� �h��|��K@Z���<�\8�_ma;�PxE�1��������~$Fn��y��_��\�djs�+�j�D~U�N��	������Ql\���_���Q��GnOVOA��\y��r���Ar�����
��3����i��J:��NV�2��?����9�?�`�oZU�W�[�d�����n��7W[Ey���4,�c�����w�.��������>�f$_�~��M4�/��-F�������A���������|�7�O����V�������QE~����V��������_�#�K�b����#$@���{C���[��O*oe.����i�i�i�i�)��c�a�6���f�9�l7[�f�nV��L�T���C�o����O����Ve������>������NQjy����6�v����N�me�q�����,�����C��B�����py�6l��Z���k��+[M�oeQ�tef�=�t�5�;���+�ih �gq���=�U1��g�����>�������������"�n�
�~B����<�;����I�4��T�����E�:����b��$�BAP�<����PQs+Wv�
�rV�rV�,�2Qn�����m��,�O�_���O��@c�[���@��
�zQ���d�z���~���E�:Y������?���Qd�lKa?�����%���L���54�Z�-Z���Y���r����k������-]$2a��i�s�>+�(wVMxinM`[�g~&���/�f=3|b��g�Y5���~�s��4�TU�W���Um�W�Le���z�VU��dW��*�V�h�Z�UeT����z_W��LC�"���������`�P�s���;�K3w��$[�!l�v"�Ou�j��u&������.����g9�����G�I�
WN8�^�J����s�@|d������?�J��sIZ����?�Y�h���(���6�kBmx�8��dBSM5
H��HS���b�mCf�`Es"b!H����e��
&.�
[|���?�|�s_�\,��|IKN�8�,l)�8�
�����p^VA�c�p�A`]��>��7�o���\G���H�o[is�F��$���[��r�wOsV�lx��B
����SWQ���B���x�d�K_�3k}Q�mQ�If.�L�Jb���>���y�
endstream
endobj
18
0
obj
<<
/Type
/Font
/Subtype
/CIDFontType2
/BaseFont
/MUFUZY+Arial-BoldMT
/CIDSystemInfo
<<
/Registry
(Adobe)
/Ordering
(UCS)
/Supplement
0
>>
/FontDescriptor
20
0
R
/CIDToGIDMap
/Identity
/DW
556
/W
[
0
[
750
]
1
18
0
19
28
556
29
36
0
37
[
722
]
38
47
0
48
[
833
]
49
67
0
68
[
556
0
556
0
556
0
610
]
75
77
0
78
[
556
277
0
]
81
84
610
85
[
389
556
0
610
0
777
0
556
500
]
]
>>
endobj
20
0
obj
<<
/Type
/FontDescriptor
/FontName
/MUFUZY+Arial-BoldMT
/Flags
4
/FontBBox
[
-627
-376
2000
1017
]
/Ascent
728
/Descent
-210
/ItalicAngle
0
/CapHeight
715
/StemV
80
/FontFile2
21
0
R
>>
endobj
23
0
obj
<<
/Filter
/FlateDecode
/Length
28
0
R
>>
stream
x�]P�n� ��{L��RR�*�}�n?��A�a|��w�����c�����S��x����`Z��;oNaN���y&+���oTN3��8��e�86�����9���������$���v��p;���#��){2z��Y����%���@����%"T��,NQL��jAKA}��z�����7H���2��S��'!�;����WO�<�2i�u�����:�[:3�D��0K�5��x�wqU��	��
endstream
endobj
25
0
obj
<<
/Filter
/FlateDecode
/Length
29
0
R
>>
stream
x���	|T��7��s�,Y&3�u���0$,�JHd����dQA�;J\�"(nTQ���!,���B��V�����V��CZr��s��!�(m�����d��y��>�y�s�B�����4�M�vnhM���F�#D����\v�o�X	�1���.�a����oy&O�:��p�����2��#"�szs���-�_1���]O�E���k����Le�[�r����|���-�j�{|��s��9�o
��p���9vP&��x�2�
�_0��a|��L�W(]g�hm3h#����c(�,m�-���R�k�O��I���F��@��"��BE����"��t��t0����@{����S_N�i�b\C���v�NC�J�#��r�.�^�qz��k0NSe�|�8�b�G�P�z�>��m�Z�F�G�jZ�M��q��z�G��:
��b�,D�S�s������F�x�rhM���Cte�c�1�8H�h�z����6|�����Ht37�Q&��0�-���������3��,��b�����>zM���r�#���q�h�I������O��g��|�k{�F?J�����M/��"K�ab�l-g�U���F������Q���Pl���U������Y�a#	+R@�����`�!Q%no��e9Q>,?�����_wM��/�+hm��d�C����<�H�#�k��W����Qm�v��[����*�v�B���/���_��S�?�N�B~���Vad��Uz��#�	"	���c�M��"���b�xZlA+������[��8)	_���y�9�ay��N�/��������^���k�ZW��V��F�i����}�g������c��1�:�������������_�ns��z�_\����~��!�a
�0A���O�w&�{8�YzC$b��D�G��L3�U�z��b�xB��7bf���(���9���eW�O��9U^%��{����Asi	�WK��h�	�Tm�v��B�j���}���N�k��zPo����@}�~��J�\��1����Sg��
�Bg����n�>����	��]�\o�'�;_���5�����Z�������L��|�<�*���*����f�E�p\��%{����^���+�'d/m�,F�L����������@G�]�+��zg��Eu&R� Y�6_�:��������p�k�]=^d�#�)m8���zG9�i��o�����U���t/_$�C.���?5�4y����1�N��_���b����/�����G�����W:�8��~9C��)bI�i��X��#�������m��^���}���U�m�~�1RL����U��t��\]\F�K��aH�yZ'=t>��x��m��; �jC�_���X����:8h����b���hYG�9����r�Hg<I�������`�15��O�nZ'��Ds(;�}1�1@��`��5�m9J�8{}1��"@_���8vR��gE%�R���$�Ct)]H�`����A��\��d��`���)#(�i�q9
�]���A�]�X��x�������\mj������f���%��cF�����E�^=�{t���s�����k[��u���-���B��f9�Y�������d����IL��s��]�����L
E&E����A�8�����"&EC�pv�hh��:;g9�����9#
9�/��z�k*��K��:1nD9��J���������^
���KCQ1)Tp����I��nSB|�p��������x��f��l}�������$�=�T4+\Z��r�Z~������e��yy��FE�)�K����,�_5u���T3�<�3������u>�tRabe�r����6�������h����Qyr��E�S������kj���G�7N�c��u���0�f�^�I<*�������X�&C<�9���2��43��O��9	K�U��7��feE��)�,T3�<�-�WL.���J5#o��	e�����&����MI^���i�����|*;��l�Y�=
_��������0�����=�fJd��B�T�+2#�R��'�s��#��|G������3��q���#�2�4��m��0��
���?�}���]����N��s|!L
��N��Y�������.B�"�Qn�Ctiv-E�
+�r���S��pJ���P|R����\M���y}�)e�{FE��$O5��
1�<TV3������
��=�,_4���--���T*�r|Cf�'F�|�s*���s���*F�D}��nE|^��X��8��9S��f�g���^g���^b��CU=��&��4���������P�(������:cOFEv4�)���f�<+c�������]�t55��5�j&����C�p�v�{���9e�l��3v����s5]�����oSX,�)"�W��������R����Ulj����8rDT��X��@�4X`�����go�U�T]E���:A*�m�	�R'�8�'��q��1�G�7��%+���P��`�����y�|8J�TH�s*�����p�����o���A���+�<sy��������+#�@fj ��N�ffu,L�%�8(�D�X$A����Dv ������o/��O��%WR!z�Q������N�!�E�q!��.��NL8r���*9r�����w	��_��������{�c�@�"#E����A���
�+j�Gz�����?�7|m��
�Nyp��l�a8I"1(�p|	�AJu��B�#l��bBJ�n�;A;��DsgZjz�N��v)�Dg!~"��U+j�>�p�&Q��b\�R@k~�����~��EH���e��q����������>��j����~�r�}�q��w9a�xq��K�rx`����w�!5��������o�71!��g��E��aG��.����>3��#�O�19Nv����|�n��L���\�����d����W�aO$����O�C���p9��0U������!{���5GMN���-��|�LHpr�>�!_b"��P��:�8C��$l���A����8�Hw.��{�'9�\	Y�2$������S�����=�5+aJ��i�2'e� �s^�p�w��A�
���;�-�[	�z���������-<m��3����Q�w�'(�'��8�����<�W��O���#�U�	�h�x7����r���}w�F��'���	W)�5	b�U8z���**lF-��������|�� ���������cf�������f���������yO?}��'�7�.~������;���/l|�9�h�����v�73�l����Ip]<�)�7{��x5BRn�9
�j���,}��[>����E9R�s�D)�����Y$�"��8���#�_E���r{%)���G2�ylQ�����������.�9"��L�!z�(���	���R� �{���TR�Q,��<�4���ya������[������o���GEs�����f����eo�H\��\�N.��7C����
{T��T`��j_��!���RR�c<�3~��|�����Mu�2ep��\N��IBJn"�6�N��$����P���2�<,z� ���Y����;���

&&'K�`$���v;�#	�)rLn*�q�����d^�[D���Z���qk��H�^�^�������}��9�+G'�J�L�1���%���?��4�XV����Rd�/�����s��8�Y<Ln�8�pVn���t��J���r�diB��r4O��N>�y�_��D`+������L��7j��Q�y�*i_����7��)o��D�H�k��(g��P�;d���7�m7��Oao�qs��>�zh�'�d�8���&A��6����=��&�	W[;1?-��;8�k�ps��e7KC8]���Nu���^yt�C7�������?�qb�S�_;>w���������O�������W��jc��]�/���w�X�3=�W(�l�J��Y�����wBf ���!�{�0�:m����o�������q���=��LA�!����@���E,|����KJ|G|G��G���&�^,����o���{�<=z��b�������}3S+�������S��$�	O�#�)�KH�$�.�v�)[:;�zDW&M���S���C/��'�^�d{��dqr�����x���]vV������UU��v��_��H�?��
������o��&,�AF/o['����P�'�<>^8�A,������b�W Y�.��
/)��-���{���f�Cb�����?����CR������4uK�W�����i��-�����
q{��E����I]%��y�mw�������������]{���"�� �|�����vJ4~�<���q*�Q"���8S�+����S�q�hW����r]Jq��f<O�[��:��\S-$�'NO\��t��D�m��~]K���D��r�'h.������j��yH&zt��S�$7����x�ud��z�����i�o+�x��P�o�y_'�G<�H�pWu^W�r�����I�B0�CR�\�����6.#�&���j���:���q���}������}����XX��y��z��8Nm'��~mr1�������Z�v����Yo����<���HBqb����HAqb��v�JvT4X��>T��������k~!W��C>z���[����Oh�N]�D���N�b9��a��I��"))j9������2R�wF�D2�����d�j��Q�����e,&+7!���3��>[
m'�������������dVa	^]���$	r���q�<Jb�ra���
[����2,dZ'+%�U�����4[��T��o%����K	$����:8:$��B� i|)�i�))���I)�I^�M$�;IZ�$����4au�9�.�`Q��s��}�}�}w�t�y�J@	����/ �P	,%�]�+���Q���\�%x�p9K�L�MJ��y�� �?Y�n_���RAJ���&�ub������I�K�� h(-��g�1�M{����l\z��VO�%�>���;��#�s���iQ������+k�����?S����wO�a�(C�_i�C�������>���_gyB�l�mOK���
l�g�O����I�V��-�x�H�xF��"(&
�0[�F<�����h��������R�x����1Oe(����'�Y���}/��5����	�[�fe�RW$�4�44.yth�V��t�L��u_����0�-���~W���)3�<�3���'��Cy���^�H�3[���c�4���m���(��������)��	�A�a���c����-$[�HJ����J0�*�['�#�K2&f�����g��_����t.���=���-66�4L=�����JM)3L��g�X�XJ��!\-����bM���N�9�}�Y���Fl���h{���}�\*���l���^����O]������>����~|�M7��G%��0�C�o��2�����E���������NX_Q���g���N�nKw<E�Iz�#t��n=��M)��DO��I��a���d��]�5
3M�Z	�l1��$k3^�;>���C���w�O>L����b����K����$��
wKN�>Y������n���m��D�a�����O���Q|%�=��/��k2�k2(L��)�>�g)1�
X{��m��E�mg�y�dw����d�\����p�
��7^;U/��TB�r���8��$�ig�pk[�����.R	�3b~K"e��g��s���}r^���*7�����88�M�zYF����
��Do@�D����Kk��Z��D���jLS{:������|B9R\�p����������'�.�k+~�}���T�9Gr�\2Mi�45�3��C[����
+�������_P�Y��q��t����u_�pu�5�3���T_��5�����e��MIn�Q��b
1i*`���:�H�JD7V��f�#���e1��hn��8�C^F�V�B;Sa��)��FB�5^A^�Wz��=��N%`J���	TE5����"i-Tb��B%��J����H�L[�q����BS��[���\<AM"[T��Qzm��W�U�4��K�������m<bRR���H��s�g��|5��E��O��v��{.]8k��%�.[�����n]����Oi������VO�U��/.�e�{�~^��~���,���1t��'�o[����8��������[%���0]��6w��������S68���/M����
�jR�dy
�;�g_�m9�����,�E�K���0N��0MN(���Vy�EZ1������r���-Lh��M
&
K���Ri�� ����c�G�����b��NJKtR�����cM���."u���i�L>���?���m�4U���H��YC�#�_�_��]�~E�e��n�]�ug�����ve}��Y�D(����7�k=[W:eK���[���3�*wX�D6
s�I��pSmo�N�=��O���	���`%'�4sr����������-[�����>�w���������//<kAS[{��A
F���i�UbB��L�������%kiP��I�����6�y�=gc����n�Mt�y��S����#7������#_~b���O��y�����!��2'10v�p���[Y�q�����o��n����^|d)T�z2��\�]p�1���;w�[=�_�q�6L_�k�����G�D�F�P%T9W������~x��}d�7|<N+������K�*J.p5��c�\,�qr=�ch�(�?CW#�z�������?��
������d`��w;�Es�E�h�;H�c��ho�cMV��V���9��
�G��:Qw��2+���A�?��)�[Z�����\��ZF�L'�[��;������nz��!�R�:/���������X$��b��X�tP��/�x����P����\�o�?�p�z�<��|��e*�-��/6�
���<��1��V�~���m��b�S��F}���1�@�L����l`�<HW�CH`�r|J����>���.BX���[h%���
U�i�Z��H������|wNP���s��|�W)��X�:�P�PI��~{������K��Q{�xn��:m�������X�j�r��/�9�uc������@|���<�e�<����p�Jk�g��0��ql(>�����'p������Z`��B��v5����M��
�>�!��x��*����Yc����9��Y��N�/����&�n�S�36U�=K��7<N�����M�j��l��}���B����+���3�r�l������{�������#�Q����mj�E�N���I�K!SV� }.
���K�cT�������x�7*����=�k9��b���!1��*}��=�9�J?$���������Ab�c��E�Dc!��iL�������-���
�W�C�����{�����l��Z�h�.�g�:��9�����tD�����9����1�i��������E5U�C���F��
�4�%���\?��F|t����Mm~��,�-�
�:��^�����w�����L�
,��~����j������	�;m����Y1�����T��w{��K���|d�2���;,mT�F��>H��}������G����z_l��S�-�:-�X���_���}}�N-7�-}����f<%�z��������J�|K�+=:V�/��,�w���C����� ����O�����G�����x��D���0����������lg�+}QB���Ut*S�s\Lk�_S'}d�����qpx������9q�:�O#O�#�j5zJ����
s��B.��E����Qe"�l���j.Ty�"��<���F#�=�5=�Cc�qU����4Z�:�@�1���R���%��b����9���qR���\�h���
pTcg������]��G[O�#� ���x�j�B*s��e�[���D�w"�����KP>h�mB�K�eK��a���+B)�je���)h_���h�b�q_����,�Fc.���
�ba�	�3���|t3����:ZH 2X�n�o��X��u���S;�O�������&��a���rXO�VZ�����_����u��qzo�_LW��J��{���iXk�s�>i����^�c����Z���3�O�������v�\#�������Q]�5E�V���~�}<G��8�^��<�����x�7i���6��;�_J7�u����j���y@{�+=�
[����a�n]�]`�~t3��uc��U����vZcp;��oG�����V�@7v0b�c����n�/�����|Ju]K�ZK���\L������Zhd��|}�9����<F��^���5�!��n������;����J3y���[�{�c�[����C8��O{����Y?�
������c��|a��&6��
�p/�a�%���������^�qX�<Gm���'�`�����%C�@_����~2�����2x�2�������T�h4��x^��f��>������_�@@�AG�^h��{6v������\yb�F�����%`��{��nKx���`����<���t;�i��SE���C�A��8h����~?�.}���w�_��C&�g�j���D�6����o�Y���~8<k�?��	�����}�A��
�����~z"�������r������R�d�_�`{�G���:=���_��Y���A���{���������g
{��G��Pspf�v_������86�z�7�>�8
�2��h�e�~V��E��M��h�(��l;����3���k���C�g��U�,��X����
����<����+�=^��w8=�@8	k�x��]�r�;��7���i�l���=�N�o��]���N&����m��p#V��8����u�O���z������>����;�v�������g����X�����%v8?J��=��������|�������>�����f�1Ge�9����k!/`���(�^���>E����[���#�����X����i�oC��Ty�-T���c���seb��\���"��l������h�m	���\}����
c��v����{�B�:���z���A�A�!�G���3N;oTy.Tw�si����!��2^Twz��u%��(�C��{:���n����������z�b��8�hw�z&4K�{�o�~-�J�;�T�.���X_9��O�c4�G��:���(���Tc��E�T=�Y����E��z��_O����*w%
p�W��Vh����{�u=�,T�W��z�u�9���.3��N�s�M��7���}L�v�r�����{(��<�
t|
Pi>�0N������u�9����6���{��4B��>�N�I���}!`�ql_��0/���m�/Vw}����Ji�n���/�z]�k��`{y�����|��~=�K������z>�w����m�_�=z%�
xP�O=����O�r��������k���gG6h���c�FA��ke���j��z����f�Ki���<�L0��R��������j�UsA9/�u<F��kO�4����#�������5�86Sm6��=�u9X���^�]��r�4E�S%C0^_��Rg���6�=���?�%�s5�~�N*��X�rSr����VX�f�q��M��c==��g|�����~T)���j��h>�����ZY��/�;�c��L�b�x������b�x��b��~���O���~�T|A,_�_��O����?����@���?5�-b��?���b���b���sl�^�M������
��~�/�iV�/V�_8�8+�,@�|^�7�jc����1����1��c���l�N�m����f��A�N>3�Sm���VZ�[l�5�^����9�U��0������~�	���|/����s���1?�u�����2ctu�k�I��h��������(y�1�S�����M���!�R?�X�;���w:*����X�<o�~��K����&�WR��
v�@�[�����n��lshKh(��U�gB���zZ�E�/>�I�?G��8�-v��@y��=����������F���y}=D������T����g[��H�]`Q��4����)�F9q�`��F�1g������(�O��+���S���g�v���u����c��R��"���i�qF'�Q����+�W-��j�	��	;�P=��f�}~����::Q�}vw~�yM�6��q�}l�5�te/&��Z�}@����m�������kl;�����������l�E������iT�����H,������/Y��k7]��@��i��4�1��B�\/P�k �>s��]w�h���EGQ������x/
���\��x�����8^�M��^i���3�tN3����G��U�<3�i�C����Fw5�P��Pc;�z�j����g��?�K��;4���N�9�����@��a�y`����!�i���T7�O����6d��E����^,�}���g�;��g6=���^b����r�C�'s��N�W���;�,�����N�u������5!��c�~�P=��ws~
�p�8c�>����&a�.?����p��r�;����&��-|ma-C8K�=�0�G�����:E����	�~���`���NV����g+��:j�N������G{^0��0��
}���������v]�[����7����M��=�9���Q��]���b��y�	lX�������*iS�OS���
e~��p6eXa�����+`�~��U�k~\SM�s�4�I��c�^�b��i��k7��X��Y�@��>��?O�����Q�y�X=�@~�c.
�/�v��p���c>lm�aa�����g<k��T����g�\�A{U����v��&�?7�������?1�����/����p}��3(S����y�6���������wn��������AMx0/��u��7�_�����^�����*o��[���Y���RP���4�N��w��.�p�p_�@����AY�'p%�;�f�;��6
���`��!�*`6�P/P<\KU�I���h:�u�l�@�����tu��F���+Q����*c�I�����J��C>����BK��N�/@������8�J������8B���g
�����k�_Ro}����+�����v��f�X���^�������4��j�8
��=��a��8A;J��s8��3������4~�X�K|�x�����,����h ����
��
�c�w��G��M����"S�O�{M���R��x0�z�{��|6(��n���J���v���1[��Q�

w�L��6�-�DQ��:�k����B���T�_��R��������F��V0��~�%c�C�������q�w5~�7�����q�w9��y�r��e�U��@��v�7!�p����C�����-{{.��P�N��Is!�r���N�Y�@6�3���S���}*���]��� ���5p�Z���w
��]h�Z��Jg���8�A�T�l����<e� qH�X�{�~�c?E�_��dJ?���1��Lh^c��II������<��5�U3-��_�MS�����q�+~V��iu���!�V��SN*Y���������'/�A����%���Cw��|v�Uf�U����g7�%)J'����no�����z7�3u^�t�A����}�Z'���l_���y��}�7����lD'�Pz���s�e���CT�q�yO�q��'�O2��w6�����}� �������`u�w������|�8C�C���P�2�"� �
2���!/���2��N����yi8=���ow����Kq�!���a�+Z5]����]C�v��z�+��U|�>m~��:������xO��(i$���C�]���w���&4�	MhB���&4�	MhB���&4�	MhB���&4�	MhB���&4�	MhB���&4�	MhB���&�_�������(�H����}��I�9Hn��Z����k���t�Z���f��ZK�Ym�`�NoNN����N��"�����v:M�r��;��v�N"��f��9Ek�������-�L��D�Z@� �"`0�xp�|3�������Q{og�=��NE6����
N6��'����+L:t�IK/0��4�u�bF��g��mM�����i������Z:�����+������ZK�( 5���7�(���nM'�IMP%�=����;����<J����#f�<�9��������Y`7�����P~H��a�s�%�c�n�U�(������������"��<��.�W�>�s�r�_H�\�|�z�W��;�t�������+Oa��	�[��l�����N�^�}kpTV�SkN}����6�c�N�����oW�� ��( ��7������9����{�����j 
������<�x�:`8������:�jmA�`�t���G������Q�U�e����AsA����A���tB�������A��_����%�0`"p7���e���`2*�I�������O�Z7Ef#���!v
z�>8��+���!�N�]���N�K�c���[�c���k�c��r&|���;�F��N�z�E�`�a�D��W^�Y��tf�:��u���u����m�`�VF
[�	V���D�HQ�VTO����[EuoQ}��.�9�:WTGD�N�SQ-"[�
G�����(��Du����-DuHt�����:+R������@���+�0�y��<���p_� S���93�i��mJ�p���f�$_@��/���zl�*yx���=�Q�����[�^�E@	0���;GI��.>�:Vduz���6�7O�E��r|��A��9��+����;��Q�����m������'��wS3,�r��]�}�`�x��`g�o�����:QL"�U�pW�q3�B9rh����(��-h�!������9������~��3��P�.j���a[���%��Eun��*� ;B*�����T�[���6x�m��sg����f�%UE������P_i���H��,��$������lv@
Mot�u�j4��*��NL��u�p������:����\AW3W�+���������x���t�n�&w*���B���N���'��t��Iv��'!�pK���)�`9xT?18�g

�4=1*\'�G��:��D4y0
�/��pp���^88����MB�U���\\'hty�08jAv4�?��z�_�,�i��**(�~mI�$���x@�9�I���ol��7��<�<��YE�{�f����
�/�.���J���3�(�������x�OiE��:1V����;��c������9���f��f�|�G�L�/.��U���8�O�oSU���M-Z�<!�Ry�2B���G��|�'���<��9O������,�9*����%Gd�,c�d)��,i��D���3yr�<��v���s��s��+,�{UL_65\6)\6������h�����)��j�.�2�����������pihS���H������h|���M�#SKk{Ez��'�Vl8�K���Z��V����l8W�������9y ������m
�Tm�����������x�n�	���I�y��}s�(����%{��u�PXM��zNj��]_N����$D{���-���w�uV���p?*�{M�5(�Qj���Qs��	7����� �,�\Z5�hp��������7�\���C�������=fd{D��HMk��q�9..���������yT���E$W���
-�;x��(=c?�|l)VU`�(UvV����W��lc�5�����5K�H�=%
������
��z)�`
endstream
endobj
22
0
obj
<<
/Type
/Font
/Subtype
/CIDFontType2
/BaseFont
/MUFUZY+ArialMT
/CIDSystemInfo
<<
/Registry
(Adobe)
/Ordering
(UCS)
/Supplement
0
>>
/FontDescriptor
24
0
R
/CIDToGIDMap
/Identity
/DW
556
/W
[
0
[
750
]
1
7
0
8
[
889
]
9
18
0
19
28
556
]
>>
endobj
24
0
obj
<<
/Type
/FontDescriptor
/FontName
/MUFUZY+ArialMT
/Flags
4
/FontBBox
[
-664
-324
2000
1005
]
/Ascent
728
/Descent
-210
/ItalicAngle
0
/CapHeight
716
/StemV
80
/FontFile2
25
0
R
>>
endobj
26
0
obj
320
endobj
27
0
obj
18821
endobj
28
0
obj
249
endobj
29
0
obj
14003
endobj
1
0
obj
<<
/Type
/Pages
/Kids
[
5
0
R
13
0
R
]
/Count
2
>>
endobj
xref
0 30
0000000002 65535 f 
0000111342 00000 n 
0000000000 00000 f 
0000000016 00000 n 
0000000142 00000 n 
0000000223 00000 n 
0000000388 00000 n 
0000075878 00000 n 
0000040204 00000 n 
0000040225 00000 n 
0000075826 00000 n 
0000076221 00000 n 
0000076370 00000 n 
0000040244 00000 n 
0000040413 00000 n 
0000076049 00000 n 
0000075784 00000 n 
0000075806 00000 n 
0000095807 00000 n 
0000076514 00000 n 
0000096199 00000 n 
0000076910 00000 n 
0000110805 00000 n 
0000096401 00000 n 
0000111061 00000 n 
0000096726 00000 n 
0000111258 00000 n 
0000111278 00000 n 
0000111300 00000 n 
0000111320 00000 n 
trailer
<<
/Size
30
/Root
3
0
R
/Info
4
0
R
>>
startxref
111408
%%EOF

compress-tpch-timing.pdfapplication/pdf; name=compress-tpch-timing.pdfDownload

%PDF-1.4
% ����
3
0
obj
<<
/Type
/Catalog
/Names
<<
>>
/PageLabels
<<
/Nums
[
0
<<
/S
/D
/St
1
>>
]
>>
/Outlines
2
0
R
/Pages
1
0
R
>>
endobj
4
0
obj
<<
/Creator
(��Google Sheets)
/Title
(��tpch)
>>
endobj
5
0
obj
<<
/Type
/Page
/Parent
1
0
R
/MediaBox
[
0
0
612
792
]
/Contents
6
0
R
/Resources
7
0
R
/Annots
9
0
R
/Group
<<
/S
/Transparency
/CS
/DeviceRGB
>>
>>
endobj
6
0
obj
<<
/Filter
/FlateDecode
/Length
8
0
R
>>
stream
x������J��7?�������Bc�����9�nK�o�����N�����Z+#b���p$���*}���?�}�������������^�U�������lg9G���������G��*�y���~���rO_�����������{i��A�q��V�+�������Zu����������z�^+��X)��{�k����7��� �����^��>��g����W�?�������\��s����,�U�����?����8�"�Z+�(����gm���~���uq�vr�E��V*}�$��:]^�.�N�W�����������'�~�����J���i�W��$��x�T�?�=�sr�}�O>���7��i��w�aF�EH�V*��iPg49]�F����h4?�F���{�5�5
����?�>�x���X���O-���#��si�!�M)7��Zjh��_>{%�a��j�z������i�Y��'����z�s�-�hy���R�g	"�]4��^K
����]����y4�EYt�ESY�Y�y��:>�*���p��k�r�������]�	^xvQ��^�zqp��������<o�aV�ZqJ����*�����`���.a$�����K�EH-�BRd/6��	�I��X�Qk#����z��o��
����
i��������?
����sNls���q6����^C4h�����?��?������q|��Y�Qqj�)�bC���6<%;T��>�Zv=u���D�g����^l�����)�X=�q>��{���\S%tUBW%4UB{S�g�t��+:��}����������K���\��Rq/6���oR�/O�W��bC{>��q����?���m��!���1^8oy���]�u>���:E�������yq�F�������
����j�>�uv��E9"�E����T��7*�!���)Rk/6��Fu����'������<Rp/6��Y�NM�U�VF��Z����ZioZ�������rN��y�!s-����z���ZkJ����k���:JB�q������?=>�@\S�w}��bC{B�w��}�����z�����:��ka�z��2����1��$�_��4���J{�!U�K�������r��Rq�}�g}�������@�u��Z�|/6�Z��-�}�"�F���
�L�*��"�*��"��"�}l�y}|�8J�?����kt�[L������������w�����S*���T���3�Cg���}����]�D-�����|/6|����/JAtE4�b�T�����������=���9��)�>�qx��"4���$vJv�hy��R�=!_��F����Z�o�u8�2��Qc/6���lz� 
�G�����6�`:_����j_�N�L�����D�C��4z�����O1C&�YpJ���p|1�����Qq���-���2<�Z�|/6��.g��g������z����x��{3��"����z����?7"�>o ��H�z�o��)D�J�����g��[E��0������G�H�C���t=4MM{;4Z'?������e-,L-T��:����Jyd�����
����;��W�G)H�%V��:Z���=���+�����
�������/c��y���5�C����f��}/6�����\%q@d��5f}�/�^r�4F��M����}BM�P�#��5=B��}���Qq��h�E��p��������U���ZSj�����a$���:#��H7�6F~���l��$����{��O����zh�����=��Yt2J]�:eFe��?�����Zr�-�S�1>}B8�2���o��:���q<e��6�2�t���4=<]��[���_�V������8q���7�(b1����������:+
�hD/H����
~�|��rx������Q��?��%q�d�5������u��qa�����b����j���uq�41�������'���!��(�5�b�/�^��-��^�A���#��w8�<w���i�L����z{����������.�7����^q�E��]�m���j��x��\[^��Qp��%#���������T]��UM����o�0�����0��5b��B�|�v��|3k��d�����
��'�d�d��f}@���h%j����-���������t��z����G���ONGS�lC
�����X����pte�����[�
��'R�2e��3���h�����Q
�Y[�X�
�z>M�N�����4=:�����&��z���#���S&i�{�W#��{h��z�������T���"�C���}/6�����8����6�2`s�r?�� �AUr�t�Q��X!����)�������Hj�|d��}J�������C���P����������d����\{��>�|Wy���������!��<@]�j��gS��[	�1�d�����
�/�zS��x��-�����to��1�
���.S6�/��n�K�k������
�O�k|��;%��xma�a�<���O
���/�������{�a��@oJ-�e�Y�����������)�5��Z���6=f]�Y�c�����c���9���[��F�L��F:nF�[p�x�KO���P���� Yqj�)��E���)��9bF�j�Y������[~��/j�-Ay�S}G{���m��/J��BE}�O�+!N�F���h*��&����=�SvU&��EGr�I��j	B��������H�U�qd���f}�����d�W��+��l��`�%�?�?m������0n��
O������2���Q�S��YqJ���P+~sE��y2q�z�~�������5F-���{��T��l����Q�
�����.����k*��R�*��RiOR��-���huajs�,���(��M;P��V����{�a�lv����&��(�q�0���0�[u6��F��CF������1JA�2c�C�����kd�*���h*��$�����u4�C\2a]�J�1{���d�J7���?	1�M�7��^l�����v����[�I�(���I���T��
$O���Lw����3�>�{�/��Q���n�;��>�������U"]%�T"�Q"�9p�_�'B���[�o�����b��F�g?�H�zS�M��r��|
�Z�_�������T��L����[F�#�����P�q���p�h9R���?����8����8��8����QYzSzU���{�oL��7���z{������,
H������*3kG-�S�:JBxq6��{����O����Ba��7Ve;������MU�U5]U�T5�I5t����Ur�'��b��K�I��MF��UF��}���Ppj�)�bC*��sd��B����f}��k�M���'�������d�c���5=v���}������Z����gs�/o_��SKN)�����}����3��h��=}5�G��C���_<HJ��������
������2d_�Z�e��8>�V���
GW��YqJ�������!�6����WD������O��8�2�Z]_����j=\��2�\��!�mC,��]r��Q����d��x|1y,�}
���d��^)2��k��!�������[�x6M_���U����k��}��5R2O\�3q{�^lXu����M���B�Hq��P�����9�1�oD����~6D���r���5�ho�X��O��82d[��k����m8��X�<mN�9v
�=�lF�y�.C�(�q����rf� ��(�q�4���W���������2Ms�'������Y�k�\z��������h���\�
��'DvhNfi�n�w�����d�F�������`+�w=:k��g���p�i-L%�V�����L���u`�_�5*};N�zS�M��J��L%!me�Y�|>����iVd��wB��'T_�u��z�^����mo�MyD�u���Qb/6���;�7�HW�t�HS����p�z�������/o9��on�����/{�Yjj�)��bC*�|b���Z�	4��h�����ZP^���^l�^���h��Qi�%�+�/^������z����t��U��%S����.c*�K�-?�Cd��%����
�����$��"-9�R�O�����*����?zBo�(���u=lM[{<l?�s��2>Z��m�����X�G�RD=$(�S&H���!�l�������'C�/�I�L������>������2�YG�(E&I�X��)~�L'���c�Lq�9�/��(�O�3��Z������gi���>���o�U�'��m:Qj�/�,y3X}���T[����'�2�z�����}8�N�;�.�;��oY�AF	p��U����F)��2�!K��U������u;j��Z����Z>��-(q��� �R�����n��%p�8�G�D|��������C2���]��_<�8�2�����2�|�S�?=qd�n����jv����G���67�"p�	�;_z;�w{����e�Je���z��'�����Y7�]�Tg�G����f���s��fJ�w����k���B�;8��;��%}�%���MA�����<���'�����t{�����~Q�Ou;d�Le���z����9K�$e�L)g}d}f_oM
��9k�o�6��O+�&��,�e���v�~t]�U����L�������B�L���S��T��������(1���{�3������_�_5.���V
�(�������������B2���{��W�c��P=�qZ�{�������vS��i>E���M ��L �{���d���d�����v���it��L���_�����:��Q����'�q�e0�%�y�T}��>�fG2�!�����_<�8
U_�*W�`��^�����O��SkM_Bg���1��<2�L'�!C�wn��T��N�;�.�;S��%J��Z�J�����~ff�Lh_�b��O,��L`�x���>��Na�n7�t�J3���P+�9������[���<��	�(�����X�<~9�����`u���b�����7�O��ns��y�������Lg������$c���{��jF�f�Y���~��6J��o�FJ�����_P4�-����_Z:tt�lu�����|����|������	�du1������o'�Q"����sV�������qB'�Q����G�������j�W���>���/���V����,����m���V����S���k���z�uC-:��:�>���n�Z�QoT�������%~��5��TSu�{v����V���"�v�S795�S{��'������tq��;���[4�n�.�NH����>r�|�����(�;��#�S��1k:
��`Ti����4�W���6�������y��E'��X��X8qL�f��U�zg���;��Ah���)g}b{b�j~unV
1�(�}:[�;�Q7���Zt����L)���M)���������Tt1)�Ba�s9��O/�����S�b�����$,��NuV�ef}��>�P�M�.8����Lu?9�F��M#%�;�+���jT���n�i�������V�)7o&���,�w>{So����N������>�����m�-|�������4�F	p��W�^��Z�Q��1�v���~?���/a�L�8�,�:]���W�����S�b�3U���7J�������7�'��)Y7���u��������u;f��Y�c���cG#`T��]<u�����p��Jt+���T�Wo�S'��S�W�	�3�%G���fv-��;s��\�E]�o���^���Sk���Q�������'���_�Bot����]E��S�����7._�@Z���SK`�s�lx��	%��g���@����,Ui�
���
��*��u,z�:}c{9?=���t����?����6v�v���f��&�n�i&��.���C�S���II��?����@/����
�t"���������f%��?�X���/��n8��^y�X}��}8���H�Yg�L���o�?�q�t2y
�������h�F]��Y�����z�^�n�^�W�}����:�[L�1������Ws3���N��*8o����9�l�l�v��]yOP}d{�?�U�?itb�8��:�s���~�����N���;s�O.������]��������&��*��];tF�����x.Jln����}��i��V�i��V��V��a��td��$��u����lt�����:����X�,U������~M:$���(
�"�>2��m>J�u2)��Y_�x��!R�����>�O��P�	��P�	���_^B��s���L�����UfI�Ch:���S�b�����"����(����#�oGsQ"���O����>�>��w!��t5�yY}d���w�4SN7�tSN3��G��������mq�Z<t"��{���Q"t�-���T�g��BU��*8o*�����#_���D�������lR��N���u2����c�8,�;����X��u|�A�PK�������t�~#��^���.�;�t��a������.��X�lU���Y����I������>���n������q��g������;��������\��OQ��S��)g}�/K�_�����,�'������S��f������7>��Om�I��
�����7����!S������^u:��Z�������^uX���$n��z����VF]p����o��Qb�!���x����}w��>�E���){��U��'�^��Q�t���_����wJ.&YL��q����S��t�Ef}d-�������%������'�q�u�%�y�T}d}j��"hy��FY�����_�wq��nX_��yOS}b���Yu��-mug}b���Ye��5Jly�R}b{j?�q�,��6t�:�����V����qLt��������7.���N��L��Yg�G.��'���4J�O�x��>�}�o
��L4�`���+��?�D���f����L\��s1]K���w>?���t��%���zg*�sO3���t��Uf}�/�XHK��Q��X��O��RD���N:��;k����n7�tJ3��w�|�dCm:]L��m?X�|�g�&f�iu���zg������"�1h������
�B�:��`��zg*��ET��v$�I�w��D���GG/�:n|������q�4��=b����gg�@��h.-���?zjM��q����G�>�=�o=�1u �8~@:c��=�]����S����\u?����!CF���~�'�'���JQm=����^��O���Q"��3�Yt����J~�))
vL7�4L{�'��Yt���4�{)�����gaB^:g���O�+��C�/o���!@�$�f}�?�P����yZ}����3����(>c�T���������Y\:9}��W[���C��
��>���9*���yf�_:�|��'s��_:�\L��:�|���_�9�.�~�@3�������(���f��z�����8�:�����#�7�������f��v~lv~����b9��b�����������Xf�i%���z�����}t��!?��f�Y��n���Qj��::}���nHN��Q|�<������f����n�i����v ��s��Y���C{�3������6��n/.zz�����������t�~��n�g�iE��z������t��-%�U�ol%�m�(�E�U��o��sjv���n����j����O{�]/�k���:�[L��'X�\�������^3�HT�VujU�w����#�n�%�"g��2�!`V�����'o�Q"����;�P��l%tl�psW�v���f��=�����4:[L�y`�s�l$�,��U���@����#��_��?�D�x5uNu�-C���g��x�}W�=�v��/�uP��{��8������S�b��V�7��cX^��~�/��z@G��U�����g���%J�o����/����Q��a�v�����G^�c�C)g9d���S��	����D�iu��]�������jhC�dTe����i%B2��k��?�o����C�(�8����o��o(�����M<�����0�����?|O��$2<�L���/X�|��z��R������.MO�+U�VujU�w����"���xj��������|�n��C�2G�X�����nBf�P���:X�lux�]����njj������?��f��$$�������_���>��}�>r�a[vH�������l���>�&3�#N-���z5��e���Z���,]�����;��,���|�����R6_��N=��s���2�;����,x������~w]��j2����!�wn:���$�����MK����D�+�	����!�������������R*VvZ��e�zg+Kg�h��ruj��L���O�U�!���B�2��r��"��7~(�������`����`������	���@�IB��2��]u���3��t�qmr�]JS�iE������g�((���n%f}b{^_��Y5!�;�\k���^��W�LY�n�c����cH�b?�8��ZR���!�YL��d����'���*~]���U�U�Z������Y'�S���r�G>���_^�F����Y��;����n���jv�������������������?������T���^����>4�Z�_������#����-���u�*�;n�������?����\���R�;h��S��5��q��������(?\�+�������;�s����Q�@���EG�����N�b���?�����!����i�@����D�vr���B�����/nDn��u�C~s�;"�=��_���b���������X(5es6��l�9H-
-�s��P�y�dF
��������\��b��N
��4T+�F�H
�

�

�F�Q���������:H6
-5T4T4To���H
��74t$�z����Nh��H6
gj�������N���E����
]��8=5tMh���P+$���k4�
4�N��BI
�j'4�.����j4�.h�u����jjj�d��SCmBCmBC��l��un/�P/�P?I6
%5�+4�+4��F���z��z��� �(��P�P�v�;�����;��nhh$���k44hhT�����d�PSC�AC�ACc�lZjhhhhh�$��744nhh$�y����<��YI6
Gjh^������I6
WjhvhhvhhN��BO
�	
�	
��d#�5�~����
�'�F�����Oh?<������������Wjh���?��+r�lzjh���?���x+|
$�U����%~�����h�B�E�G���?��#_�7����D�(��
��#�����J��������_�[JX������1���VW:*��+�k\t�G�v�R����q���+|���Q����c^���+��kn�A�������?�����S��u�����BV���A�[?h�O��?v���*�n���?��
��3�N�[�j�O��
����%,��Z��|�y���
�n}��?��
�XgF'�n}�?��
��3��T��*���s�O��Q'��/d�'�\���|�������'����u�p�B���P������uft���+T��}}Je�]��k�����%��n(
R�u���5��������&��#T���������R]��:|�����ufTIu�����S%��e(5R]�:|R�o=�J�T��P�w���o��v��������J��t��z��C
�a��P����Bu�V����w�R#����W��6���$�������:w�R!��3T���%w����T7�P6���^��.R���:l,,��g(uR���:�p+�Cd(�I��%T��G%7I�p�B��g��TJ�5��NR��Bu��Q���t��f�yW������Iu���������y���#T���K�����R�]Cu�{��=z;@������]�%���J�Tw�Pn,y�P�����V���0��b���B�.��)y��W::���uDx�T���mF����&��;��Yu�*4Ru��'�|�2�7*��iDC�����R����Y��7�����vxU�Hl�O@�uP���V���'E	�k�VJ�M�/����C�/J�]�/������Q��FY�l:�Q
k;��~��/�8R)���;��~��H���vr�>)�t��J�D`��y��d���H��% �"�+���p����Z�/nM_�D�J�Y`-k�����5j�c�L��f�
3<
&����"����Z�3�M�lt���w0E��
6��g�L�g�����^S��Y�FC�g�)��`��Hp�c=_'����t����0E�.'d��O�V�����dG*u�3&����"��[�Z�s+�gX+;�RQ=�ma�<+��T=�{a�<�lt����0E��
6��g�2L�&ZX�J��`-���3��3<���X�FU��)�V�Qx9X�z�C.2n��Pv�q����z���"����Z��8X��yv�C���)�4V�QU=�b�<�l�T�p��"��
6�gxDLqEw����-�Z��<X�0�pi(t���1E��
6�T�p��"~sxE*������9Y���V�R�LX�z��&���TT����"��
6
�<��p�+�-%���/n�/�^�0��"��60������\������4�b�\��"��������8,0�Lv���y[�V$Y+�l�6l���C���}!~�gG�Q�OIvs��M��
�6���0��k�d�G�d��dU>u�BC�a�O���3������>~�d�d�|�t���`$�����i��r�������d��u�8�L�D���:��`&�"y�n�N0���,�h�4�b�t@�z��S$�X�`�	f�K$�xZ���d]$�xY���dS$���`�	F��}�=-QL�/���H�o��d��:f�`&�D���u�A�L�E���u�B�Lv���ulC�H�f��
�@����[\�H�����(���H��Z�`�dC$���`7
f�[$�8�`>
F����m�6d,�����XE���u�I�L�D���:6�`&"y�f�R0��w$�h��/c�t@|�����',�043�%�w<�`o
f�.�w��`v
f�)�w���>#Y|���6L4�K�w1n<E���:��`&�D���uLR�L�E���u,S�L6E���:�`$��Z��i��r������1�/O�j,���'��D���:V�`&"y�f�U0��"y�aV�r�+���X��_���vIfxtKV�$�C��!��uX�6%Y��lL�6l���[��[���D��0s#����c)���UqX��E�a����"�#��{������p��G �:(D��o��)�IQa����R�7��R�@��{�E)�[���+L)L^,m�����q�0y��s�������&/����:6���Ja�b��<�]c���&/���g�a@o7E*����]�<U��&/��|+����T�*a��L�g�p����Z2��M7��h��a�2!l���&/���a�n��`�C���)�4V�QU=��e�<�l�T�0y�"��
6�g��Ly�c+������:Y��uw�S���)�t9!+]�g��L�g��X)L����m�+���� 7�5�)�J�N1�rS��Mox�Q��b��sq��p��"O�60��)�3E��m`��)�3�;q�6PJ�k�)z���7��t��M��)�4n��M��)�n��M��)��r��N1�����z�7�"��)�Z��8X���w�C���)�4V�QU=�)f�<�l�T�p��"��
6�g8�LqYx����)�Z��<X���q})t���3E��
6�T�p��"�du�3�b&��7�"��)�Z��]X���w��z�S�y.V���z���y:+��R=��e�<�l�U�0y����.>����:�����\����a�
f�K.�����"�Z�d�H�a+�l�l��>$�p��>%Y��_WwG���Y���k�d�G�d��dU>u��E�a�O�0y3������>~���d�|�t�&�`$�����i��r�������d��uL^�L�D���:&�`&�"y�n�W0������i��r����S$�X�`�
f�K$�xZ���d]$�xY���dS$���`�
F2����(����7�����u|���u�S���I]��`f�K����_vI/]���aD^�K��3�-��8��`F2��;�0��X.����ba��q�8�K/�*����z	v�`&�<��z	v�`&��y���b�HxG�]���!��������,��Z�.�dC$���`F2��;�0��X.x���u�b�L��	�.�����]$�xY�.�dS$���`F�YD��i�r��Y��)�w,���3�%�w<�`f�.�w��`f�)�w����#�}��
�.����!pW���a�X0����x���c�t����!�wl���3�-�w�����,K����X�p��K��%���-Y��I��D�a���dmJ�1%��MK6nI6nIv�m8��.�d����X2sj��
��������#��{�����L	8r�#"`��]L���(�����J)����Y)J �������d��v1�6�x��8�R��X�9�~��tu�T
�k��o���H����vr����iN�T
�ko���0���"��.�����p�*�U
�kE����xG*U�0�b��3X�FMu��)���^������x���v1���ao��`�C���)�4V�QU=�.f�<�l�T����"��
6�g��Ly�c+�]����:Y�0�w�S���)�t9!+]vJ�rN��g�;R�����2!����N/���[a=���y���N/S��X�F��N/S���`�K���)�LV�QW=��e�7��
VJ�kY��������X�����2E��
6��g8�L�g�����N/S���*Ci����2!����N/�����z����(�g8�L�������N/S��`��z���ynV��P=��e�+����N/�����z���KC�C���)�tV���z���y&+������2!����N/�����n:Y�FE���)�\�`�S���)�tV���z���y&+������2�u/�]\�����������\��.���
f�K������"�Z�d�H�a+�l�l��>$�p��>%��:kwG���Y�a�k�d�����K�U��k���>4���dC>):6���W0���q�q��G8����.>{*�����C:�.��pz3Y�;V�8���l���u�^�HkwG����`�O��c���+��.��c8��\:�^��vmxY���dS$�����
F2X�t(���u�R�`�"�a��?+��.���iV0�u���eV0��"y�aV0���������r��vH��E2��:��`&k"y��gc�t@k�0dm����
f�[$�8���
F2�;�0��X.���HfxX���dM����:��`&"y�fV0���������r��Q��)�w,��g3�%�w<���
f����O�l,�]:`L��c��?+�f��?��f���H��X���d�H����?+���H����?+���H��[���d�!�7L6�K��t�]E���u�Y�L�D���:��`&��������r��{H��H�qX��D�0dW�!����; ,W`�������IV�$kC�
��dmJ�6%��m��%�$�$�o�6���
F�CQ�R,�Y�b��!�h��H���#��{���_��p��G �:(D ���Y�V@��e
�1��������@���T�O�d��+��e�tT�K+�dO�w�Q�$a���3(��_���$�T�%�s��,B|�h��Z{J�����B7%	�R�I��BNB�%�i�.��~�3:(I��{�OaF'%	�t� ��f�aF��tN��Vx�;������A������H���.��X��;������F������H���.�X��;���V���Su��b�'}�u�b���&�_��&���;u��b�!]�I��Rw�)�v�^�#u��b�]�kO���DbKCs�Eb�����Hl�f.�$6��;����V��Ib�����Hl�c�7�Bb������Hlib��:�N��	�o��������u�0#[���
F�3"��w�����[���������N��8W����\XI��7w�)6-���ss��b��\8H�p6w�)6���7)��3"��S�.yR,<�}edD�M�r_]�bah�ZK��7=���7s��b��\8I��2w�)6���]��s�9�b��\x�bab�0#R�W�E������H�iJ.�$6��;����>���-|����|�&����!�8"��/#�A�5����[B
�|�>�<
gEB�4�t��>$���L��p���E���5�\+g�UB/�T�#���3�E�:����7�`s����(���7:���PG���N���`��V��3G�x���+k�"BY��nn�q��	R�`����57����t�����C��5�����5�����7_�q��ud��)����� "��5������}�PG�����"���ePT�������PGVfzw�;+F_�%�g����|�2��!�������v�2a���7de�e���2��!����~�������PGVf�u�+�BY���m�Y������L�nq?X���v��2���\e@Uu&�(3��g��X�p���Own�`���C���(rZs�Ga�����o�O_n��1[���OSn�����PG�x:r�;k�BY�i�m��qxwud���x�8������4�6��q�vud����b���C�#k<-��]&�]���2�����`e�_��o��L�lsee�\C�#+3�����	g
����t�6V&l5�:�2�&�|�2��!����������B���I���9S��88S��
����:N���f�3�)��7g�f��(0����
�,8Nv�����LjJm���ud������om��6�[���#��% �_!i���'����p�^Q����|�����O�p����R�����p�^aJa�ai�,��6�c��������ypZ_�m�w
�k��I}��Y�)�:�����E0�V�RXvX{s���Ao7E*�k��]�<U�����|+�7\��TU����y�������)���^��b��G&�	����&�����m��`�C����L�������{�"�`5�3�>2E��l4T������?V�Rzz8����������t�����)�t9!+]vJ�rN�����V�RW=c�#�`��zT)�=�e=�8����{����N�s�|+R�T=c�#S���`�K����L�g�����{��M�����3�Z�s/��0����PQ=c�#S�i�`��z��G��3X�FM����L�����a��\h���0�^�Ji!b-�y�9�E��V�p���<*�9,��TU����y+�������)���`��z��G���;X�J�(b-�y��������z��G���Y�F��{�"�du�3�>2!O���H�t���|�sX�+�RQ=c�#S��;�"��c�Z��}����qG*]�g�}d�<�l�U�������4�Kx��,c�\B�"���5�\��.���Q0�]r��x���I��$E�
[�d��d��d�!����d�)�n�y��+�0-�X�ZcX�$3<�%��$���B��;�0��X.��&������>9b��`&����������(�����=������!N�NfxX`��`&k"y�j�����l���u�>
F��$W�a���\:���p%w2�b�������;����(���M�h��9c�t@��aO�h�n�����,�=�PL�3���H`�#���}�d�H�����G�L�E���u�>
f�[$�8���Q0�����
�E������s'3<���Q0�5��c���G�LtG������tG��}�d���h�4Uc�t@?������}�dM$�X���Q0�
��c���G�H���6L�5�K�"��NfX���Q0�]"y��:{3Y�;^���(���H�1��X.0�t@��+�0��X.0�t@��;�a���G�Lv��O��}�d]$�xY`��`&�"y�n�����,��m�{c�t�}H������(���H��Z`��`&"y�f�����,��	��������f�+��|�X�n-�vIfxtKV�$�C��!��uX�6%Y��lL�6l���[��[���D����(�uDK�df��9����6���N�y�����]������A����+��Y��'E	�?�VJ��T�JQ������o%���0�yI��L��l�F)�Y���g�l�:G*�?�������M}p�R��X;9�~��4'G*�?��7��o��M�J��BhEdW8O�*�?��"����#�#��J�,S�,\��:�?�ynV��P1��eB8�+R)�Y�e=�u�TY�F���,S�i�`��z�?�y+������2E��l4T��g����
VJg=��u�����<J���,S��rBV�������Y�pdw�RW=��������T���Z�s�S��NV�Q��X�zn'����T:U��g�"Og]�g��L�g������,S��V�R��X�z���G��BE��)�4V�QU=��e�<�l�T��g�"�-WJ��3n���Y�pdW�R��X�zq~Qe�?����QY�pdw�RU=��e�<�l�T��g�"��
6�g��LqEw������Z��<X�pdqi(t����2E��
6�T��g�"�du�3�Y&��#�"����Z��]X�pdw��z�?�y.V�Q��X�z�/�3��t����2E��
6��g��Lq�C������j��.E��a�"�a��h������kg���d�H�V$�(m��%�$�$��6�%�OIv����������r�zX��Z%��Q-Y�$Y���K�
�������h���,�
�}r�?+��n���8��#�Y�HC�=������!C����������u�Y�L6E���:��`$�!��
������H��E2�bV0�]"y��:��`&�"y��gc�t@��0dm����
F2�t(�?�u�R�`�"�a��?+��.���iV0�u���eV0��"y�aV0���������r��vH��E2��:��`&k"y�jV0�
��c���\:�
���6��g#�m��l,���tY$3<���
f���C�jV0�
��c��?+�`��h��gc�t�(���;�������;���g3Y�;^��g3YL��v�����l��a���\:`��y���u�Y�Lv��O��������/�����l���u�Y�Hv"y��gc�t�}H�U$�xX���dM$�X���
f�!�wl��g3Y���?����; �mH�l,��X�$3<�%�C��!���h�:,Y���MI6�D�i��-��-��[�
�v@�����PG��Kf����a�"�0,���T��������������
�/��^�,TX���UI�mD)\�"�����Z&�95JC[_c�QDu��[_c�QD
��[_��N�<{��#�&����e��9���H���������7��
�	��%\D�[��"���<U����e�H��������evG��i�����vG��i�����L�tb=������vG��i����z�
�#�X���u�`=����F�g����f=����F�g�������g=���L[__�g��;���L[_w�3l�i�z����'�6��4b=����m��H'�!mY]|�a��H#�!mY]�Y��Ow�����.��C��;��uH[V�;Ya�>�oyF�C����8Y��O�>*TY��euqc�>��F�C���x�a��H#�!mY]|���-�Fr�p����L�t��*mY]��SDi�z�-������4b=�������tG��i������tG��i����:X��O���������8�3�S\�	]�g������a��H#�3mY]<Y��Ow����������H'�3mY]|��a��H#�3mY]�}PDi�z�-��'����4b�F�}=]���'�����6��N�j���hG�p�������d�-���H�V,�(�l�v<$�8,�}H���h�S��*���\��~C)o4��*�j�h�K�����K��K��cm4�������Y�6��n��������6���������Q:�6���t|OD;J�F��@����n�tm4�L����v�����|��`D��tm4����=�(@M��t|OD;J�F��@����hG��h����=�(@M��Z]�Q8���X��u3�QDK{C�]D�2F)�CDK{C���V%�E��7t=�����7�����@���D���������V%�E��7t?0D��*�(������[D���"Z���7�
F��hio�~�����p�Q�m�N���]��v������!���S:������������7t? �mV%��ho�~���U�hG������*�(@{C�S:V%��ho�z_�+�o(@{C��t�JD;J�����%�����7t?��`U"�Qg��:_�+�o(������*����hG-m��4-�ED;�hi;�~`�h�."�QDK�9����"�E�����@|O� ���E��9��K��%�qH�:,Y��
�v���MK6�$S�oI6nKv���6���sK�.���
f��X&�S������(�I~w�x�">�����_�_3���������A!�L���R���~R������R�JQa_����o�<�0�00��Q�����i������y�;���s�R�X;8�>�o��#������y�y���&���g����M�Ji`BhEdW8O�*����"������#��J{C�"�`�5�1��2E���k4T����<0W�R�X�z�-������3��2E��
6��g�
e�<�l�T���ynV��P=co(S��X�Ji`���z�N�3���G�T=co(S��rBV�������Y�0w�RW=co(�`7��TJkY�-N�/:Y�FE����L��b��g�
e�<�lt���7�)�LV�QW=co(S��V�R�X�z�����BE����L�������{C�"�`5�3��2E�[�2��]g�r�q��a`�H�4B���<����
6
#kY����a`�H��z��P��3X�FM����L��f
�3��2���
VJ#kY��`=�������z��P���Y�F��{C�"�du�3��2!���F(�����z����(�3��2E��lt���7�)�tV���z��P��3Y�F]����Lq����i��j��.E����"�a��h�
�d�\;;��%kE��"�F�h�V,�8$�8$�}H��8,�}J��u.��6LK5����%�*��j��%��,��]m�j,�O�G���pAm���#��
f�[>.:�������.(>{*�����C:.(���*���H��Z`o�`&�"y�n�����.��6LK5�K�"��u��
f�K$�xZ`o�`&�"y��Tc�t@��pAm���7T0����C1-�X��"���HfX��7T0�]"y��:{C3Y�;^��*��n������P�HKuG�?���!K���
�dM$�X��7T0�
��c���\:�
�X��6��*�`��h��gc�t@?�`�"��a����������Z`o�`&"y�f��������6L6�K�"0N��c���P�Lv��O��
�d]$�xY`o�`&�)z}��gc�t������
������H�S$�X��7T0�]"y��:{C3Y�;^��*���H��[`o�`$���a�
����!pW���a���������u��
f�!�wl��*��b�^�0��X.p�����hC2{c9w@����%���-Y��I��D�a���dmJ�1%��MK6nI6nIv�m8�bo�`$;�u,���5*�h���6�t�&�y�����]~���%���� @�uP����,R������+��o*f�(��g���R���pdW�R��X�(\�Mg�4J��bm�<��c��F�T
k��o��q�R��X;9�~��4'G*�?��7��o��M�J��BhEdW8O�*�?��"����#�#��J�,S�,\��:�?�ynV��P1��eB8�+R)�Y�e=�u�TY�F���,S�i�`��z�?�y+������2E��l4T��g����
VJg=��u�����<J���,S��rBV�������Y�pdw�RW=��eB8�+R)�Y�e=�u�t������Z�s;Y�pdw���z�?�y:+��R=��e�<�l�U��g��M�������Z�s/�g8�x7*�g��L��������,S��`��z�?�yn��Pv�q����z�#�"����Z���s��*+�(�Y�e=��z�#�#����,S��`��z�?�ynV��P=��e�+�����,�����z�#�KC�C��)�tV���z�?�y&+������2!����,�����n:Y�FE��)�\�`�S��)�tV���z�?�y&+������2�u/Y\��?������!�\��.���
f�K������"�Z�d�H�a+�l�l��>$�p��>%��:CvG�?�Y�a�k�d�G�d��dq�^�.�66�������F��6l����`&���������g#Y|�TL6�K��tY$3<���
f�&�w���g3��;v��������6L6�K�"C��u�Y�Lv��O��������������K��E�a��?+�`�b����l,�	J��!�d��:��`&�D���u�Y�L�E���u�Y�Lv���u�Y�HCvG�?���!C����������u�Y�L6D�����r��6�`�"�pX���d0dw�a���\:��0d�����?+���
�u�Y�L6D���:��`$�!��
������H�S$�X���
f�K$�xZ���d]$�xY���d1E�O�l,�S:`��a���\:`��y���u�Y�Lv��O��������/�����l���u�Y�Hv"y��gc�t�}H�U$�xX���dM$�X���
f�!�wl��g3Y���?����; �mH�l,��X�$3<�%�C��!���h�:,Y���MI6�D�i��-��-��[�
�v@�����PG��Kf����a�"�0-��Y��g��������p$�	�G�W��}q���Pa��^V%���pQ��V��{���(MZ�����F�����y[Li48OX�����F�����y[Lits��V���7�	?N,2)�'����`�H���TU���)"�N�����2vE���Z�|����]�F,��V��{���zNkuse=�
�"�X�i�nn���aW��9�����6��4b=����f=�
�"�X�i��+�Y�a������zNku��zvE���Z��Y�a��H#�sZ��'�9l�i�zNku1���	�N���D7_���OW��0-���u���4b�%�y��>]�F���D�;Ya�}����X�i����u��~����D77�a��+��u�����:�tE���|���-�Fr�p����L�t��jZ��+�9��i�zNKtsc=�}�"�X�i�n���OW��9-��7�9��i�zNKt_V���O����9-Q\����>�uB�9-�������4b=�%�y���>]�F���D��{���zNKt��z�tE����|���>]�F���D7O�a��+��u�F&���\��������5�L<p�Uk��;�Q.\�����$�n�6lE��b�F�d�H��!��a��C���D;���V��w�
F��H����J�Z%���d��d��d��hG���F&�A-|��(������|Z�sG;��42���^����(�F&���{�hG��42�@���n�t@�x`J����������{#�
�����t@��;�Q: �L<pI��������]: |��(�F&���{�hG��42���^�Q8���X��{3�QD��#�"��*1JQ"����-�
�rG;�h�{���{#�
E��=��*�
�rG;�h�{�MDV��v������6���(�M��"��*w���6�����W0��PD��#�:i�:\s�a[��6��W0���{�C: ��m8��{���{#�
��{���oiU�hG�����%V��v�H�t���*w��t@z�x`J�U����{���{#�
��{��t@X�;�Q: �G<pI�U����{�]�]$��:s����^��~Cm��x��h�]���"���@�����E�i��!�
wqG;�h�.���6���(�M�p=��+��wd��]�j�d�K���duX�6$Y�8%Y��lLI6�D;��l���6���3K�-1���+���b��O}�w�
f���&�����4��������{�*G���_��3�OJ��a�IQ����H1+E	�}�^�b�,���������FY`Kn:�Q
k;�������Ja`b��<�����9R)L���g��o�|4G*����7���$�vS�R�Z��SE�Ja`b����~a<�H����P��3X�FMu���L��f�
3��2!����&���aKn��`�C����L�������{C�"�`5�3��2E��l4T�����?V�R�8�������q�Q:U���y����.;%w9'w�3����U����<�
�"����Z�s�S��NV�QQ=co(S��X�F��{C�"Og]�g�
e�<�l�U���o����F(���{a=�����PQ=co(S�i�`��z��P��3X�FM����L�����a��\h��g�+R)�P�e=�8�������Z����g�;R����7�)�V�QS=co(S��Y�FC����LqEw�����Z��<X�00qi(t���7�)�tV���z��P��3Y�F]����L�sE*��������a`�<JE����L��b������b=c7��T�T���y&+�����7�)�{a��^0-�X-����54\P�2,v�������kg���d�H�V$�(m��%�$�$��6�%�OIv������i��r�z���Z%��Q-Y�$Y���K�
�R�����h��.(�
�}r��P�Lv��E�a�7T0���gO��Tc�t�yH�E2��:{C3Y�;V��
�dS$����7T0�����i��r��Z���"�a���P�Lv��O��
�d]$��j,��]:.(�
�u��
F2��t(���u�R����u��
f�K$�xZ`o�`&�"y��:{C3�-�w��*�`��h��gc�t@;�`�"��a���������u��
f�!�w6�K�!K���:{C#,�m��l,���t,U$3<��7T0�5V��
�dC$����7T0��R������r��Q��)�w,��*��.���i��������/��
�d1E�O���7T0��"�7L6�K�"0O��c���P�Lv��O��
�d]$�xY`o�`&�"y�n������>D���74�K��t�]E���u��
f�&�w���*���H��Y`o�`&��{}�0{c�t�}s�?��
�������k�d�G�duH�:$YmX�%kS��)���h�6-��%��%�}K���������PG��Kf������"�0,�?��?�$�?����_���I���0E���0%�gl��)���#������y����Y9R�W��M[���v��q���!<��2�6�wI��_�W�h��pc��d����D����$��/�9%�0��X~K�����9�0�eQ�IVU��a��r|���E�a5����dC���L�px3�-rw&�������6L�7�K��V���a�W0�5��c���+���H��Y���d�H�qX���3�H�0=�8�J\�t�_$3<��f��'}��O�]��]:0�
�u<a�HxG�/��Z��_x���������#�h��:�`&�"y��:>�`&�"y�n�X0���H�0
�X.��t\b$3,���3Y�;V�x���l���ud�Lv�������[/n�8�;�0��X.0�,��*�w;9�K�*���:��`&"y�fgY0��"y�a{Y0/@��aZ��\:`���������,���H�����,���H��[�r�d��w�a���\:�.������X�|�d�H�1�X.p_�p�mxY���dS$����E��;���I�����(P�~�	���+�L/�����[���h�Q4�s+�q�q��>4��8<�}j�[:#|idpN�:#�;�}�U3:�3�K3����vi���3B?/M?0�����?3��V���~N~����^+gF����vFh����\���|x���V��M{�����le�8�G��{���V��0���9]�����E{Nvdt.�3���)��=����ne���G�������L���w�X����628���6m*�3��#�s�����L/��7>�g�|+S��=�����oe�xk�������{{�Kr';��~�����^
�;�L���,h�]�.��MX��
��]u���$3�0x����}y&�'��� ������4f�[�f�����71j<k�d7�`t���Z3�G,=j��d�s�����l���^k&����1���Ac6��jf�Q3�'F�g��l�#�N��j�j�d��`���;��l�#f���g��4��?�f��)��5��s��qC�x����9b���F
�i�d�`d[c�b2��j��L�P#���-P3aB��5R�Ik&{����)��5��u��qG�x����:b�P#��f��������mvn���3j&����Z3�lG,j����Lv�F�)�����;b��N�Z�d�s�L���������2�	�;2j<��q���zdt4^�ft]@F�Ac4����#2��{�2�3��\�O�f�v8�d���l�����Y������>�tR�D�<��@D���"��t>���R�CN��+�$sFj��&H��������HD
��8$���0G�����d�cr��s��j�%��XI2e/&5�0H&>*7�7�d�F���ZG%��6
�����zi��M b��:
�M5�%����wu���j�H&>2��^�6Y����2�����2��_OM5�!.�������D���2����^e�����j(c\�{`'c�l��TC����{��������8�'C=5�P��8@7mf/c����.2����d�����j(c\�{�]�8�����2��1�2��DOM5�1.������D���22���dd����j(#S�{�]F&�����22�1�22��NM5��)�=�|��L2��;RC��(�����I.w~�"\ed��p�dd����j(#S�{`/#����TC��@�GXDKhc��������Pf_q(�����qS
e��cq�d�����j(c\�{`/c���TC��h�G��f������Y�8Y�i���qq<n�';
n2����e�����j(c\�{� c���TC�����;BKDM(c\�{�E�8y����2��A��d��q��j(c\�{� #����TC����4>�`�K.s������ta�0y��y�a,���_����5v��I��	����<��]��.�����EY�# 7/�[y�m���@���������s��
�>������A��s����8�6]��5H^rn^`� ���n���i�A���taU����T!��M��,d��VxU��o��� �87/0�B{\��F��%U�p��TA�qn^`P�87]�@d��U�Cp��TA�qn^`P�87]�
2�s��*�A����4@1+R`�&��*��E�C��k����\]��Q�C�l��aL�nn^`cq��q!�#
����X�b�.��d����0������������$�t�C�����X�e�.Dc�ps�C��l�yGi������<�ta���U����V�����E��*�����U�o��k@�T��B�Y 7/1�B�k�.�������U��m��
TAvmn^`P�8�6]�A����U��m��� �67/0�B�o{\�{L��%U�n��TAmn^`P�8�6]�@����U�Sn����wTA�q6|W��{M��%�0���+�1���y�!�������0&57/0��8�6]��d����0G���������������u 5�`����ta��n��y�=��^�9t�c��sA��t>`�G��Q�Ee�h_G;��Q�N��h^`�w����5@�����������+74�����|OK:�W"h�]T��d�e3Dl���.�'�Q�/�%"���5�OXeKD\��~�I��9y�GS��LT�,!s-��������#�m��
r�s�����Hp��ZA�q�?@�_K>1Z+�5�������Q�VP�8���0F' [1����s�"����Z�U=�8K�d"]C�"�j�%d����W�O5��GkE�s�9(�j�\!�5����YB&s��JT�,!�yy
�R�8K�dB^C�@5���
!���q���-��^���R�8K�d;&}7��w��;(���Z�])�j�%$2*j>Z+(j���AT������\��
p(�<��Z�E)�j�%d�B^�M)�j�%d�!���@5��k|��WP�8����	@�n ����YB&s��JT�,!�yy
�R�8K�d�>
z=����=���=Z+(j���A���
��)�5��������M�\���YB&��:��q���"���^)�j�%��!���q�?�@�o�"����YB&�!�5����YB&��JT�,!��
|�VP�8�����9�B^�I)�j�%d�m��
r�s�9( n�*jN�����YB&��JT�,!��]��$�E�35�O�i�o��3�����g��q�O�/Q3�	���~B������~F�8#��~��qA��`�91h,j����k�3^Wd�x^5c�q,��s�3������3���Ac������`���\`�?���03�����5��5���r�3�����8�5R�Uk�j���)��5C5����l�����q���uB���5��f���`�P#^�f���`�7d��k��jf�Q3dIg�w��q���J�� �E�3�P�Mj�j�3�����8�j�����8�;j�����8�5R`�5C5����<�����q��7�f�����Z3T�X0:�H�W��q,��s�3�@�8��!�;3h��f��03��45��5���r�3�����8�5R�Uk�j�F�)����g��Hvxb�X�8S���P3~A�x���g��qC�x���g��qG�x���g���m������Z�f|@��	5���q���0�f��)��5C5����)��5C5����)��5C5��c@�x���g��g������Z�f����+j�����8�5R�Uk�j�F�)����g����������Z�fb��0h,k��h�k�3^wd�x�5���q���<2h�z��2���> ��.hF��Gd�4�J3\��g��x�4����gs>3h,l�[2>������H�aK��v	��o��� w���esk�H���oW�C/�"���D��?~�Izmka�M6UP����NR�;@2j�Z���O?����Uy�,�VP��y�,��k*/9��

w>/92zC��\Z+(����d�r��e0:#rG$cw^@&[1��|�9�
�v��Z�U=��2��H����gw^@&��z���Hdl��Z+(���sP��\!�5����������R��2�����)�;/ �Ey
�R����
!��t�S:�;�
B^�E)��y�l��������y��VpW
`w^@"c;�v]A�������? 5���R��2�!�������L�C�k�)�;/ �y
w�v����!��t���A���1����R��2����pU
`w^@&��:�v�d��c��DP��7�v������O?�;@�a2�;�~
`w��B^�U)��y��C�k������"���^)��yy:C�+(���sP��i4!�������L�C�k�)�;/ �y
w�v�$2��o���|�9(���.�NJ���dd���|�9(�����nJ���dB^�])��yy��n|�&A,���?��Ox���'�9�����q�O�/Q3�	���~B������~F�8#��~��qA��n����t�sP�p�^Wd�x^5#��FrQ�����-�{����=~"��ofv�%������3��3#����[a����v�^Q#��f���X0:�H�W�v�%�5R�]k��y������Ac����v�^P#��f���X0n��/Z3��K,�EIw>�@��;�p@�x��aw^bfd7>M),���B�6M�v��������
5R�Ek��y���)��5�����1�F
��f������?4��|n��aw>�5R�Yk��y��C�x��aw^b�H.Jb�X���j����#j��^k��y������Ac����v�^Q#��f���X0:�H�W�v�%�5R`�5������n����t�s���	/��OZ3��K,7�H��v�%�;j��������)�p�s���f7�`�X���j�����H�'�v�%�j�������5R�Mk��y�c@�x��aw^bfd7�`�X���j����W�H�g�v�%�5R�Uk��y��G��i��;/�`d'd�b����v�oX��7����-@3��Ox��Q�y���G��#������5���2����}DF�1Fd��+�w^bf��^�i���GC\���Ac���A��,�����|k:�\"h�oG "�f���=�4�oJK�\�'��'���B��	����R�:�%<�t!�3rk
w cO=]�����ZCd�������[k��=�t!�3rk
#���������,��yW�M@&�����Z��V�ywr��Z���SO<>Y������SO"D?Y���� ���6��uA�SOVPY��������@|j�!(@x�����Zk
�z�Ad�������p���)� <�ta��Zk
�z�����O�5O=]���Sk
A�S?.��R�B�
<]� h�5O�5��6x��C��k�ZkA+l�t!@��k�ZkA+l��
� h�5O�[
!h�
�������79��V������%�<���V������%�<���V���B��J�!��8d�*h����u!m<]�v���ZCP����
 �<�� l�t���5O�5<]��r�Sk
A�O��@�y(j
6xs��5�cP�(@����
 �<�� l�t!��5O�5����=H�
6x����5O�5<]�����ZCP�����AKFwj�!��������m2��wD�q�-|�|e�179����8��5}�Dd�"2h�&dt�f�2�	
<#��5c��1��P���f�13�"Z�����"��"C�7d\7��6dt2?K�o��x�0%�;3�M�o��D�:%�;3?P�o����3C�Q3���WV�9������[�+5CN7}�+��f�o���9������[�+y�t��Pb������5CNwf(0jF������!�;35#|�|%;�C�Q3���Wj����P`�������y:bf(�����f����C�1��������K����#\���J�'_:3#\���J�H13�#\�����N�tf(0F�p���N�tf(0F�p���N�tf(0F�p����N�tf(0F�p�����:bf(1F�p���UM\�j���j�r�o���:b�P`��p����!_:3hP3�eNW��������2�+x�(�������2�+j�|��P`��p���5C�tf(0jF���J@��/�
��.s���\G�%F��9_YP3�Kg��f����l���3C�Q3�e�Wv������j������#f�c�O8_Y1��E���p��0��E���p��1��E���p�1��E����q�vcGL.=�|e��q����W��GF����]��> ��P���>j��1FdPx��� =�|e����
�Wq*5���X0].�C�v�����������%���%%�l�l���a�����7}�}�-�N~!	D<a�-�+����@�C��TA������"G:�����W���@��4	n;�V����sd�%�������W�� ���C��
�_��,;�98�l����sPN����
�_���B����Z+����p	��C�k�T�S
��L!�5�*��\B"#��h������A�H'�B�k8+P
��L� �5\��\B&��:�����"���^)�j�%�B^A�W�t

�PY��L�E)�j�%d�����N�;��@sj���@5�}�~u�9(�q���B^C��?�@~uj���@5�2�!���@5�2Y���pW
�p	�5>A�+(���sP�>�������R��K�dB^�U)�j�%d2!��S
�p	�,��GA�G?�GP��Gk���
���op��LA6���A~�_�Z+�*P
��L�!�5tJT.!�Ey
�R��K��B^Aa~����
 �:g�JT.!���nJT.!�y
w�������>Z+(���sP@�@�W'2'�����6y
�����vy
7���������R��K�Cw���7	b��S��&� �;j<���,7�(�5����M��'d��M�����gd�32h�g�d��v����K����3^Wd�x^5c^�X0�
4f/�Z���������������TX0F�N.���T����_�
/�Z�f�5C�wf�x���p����F
�j�P
8`�P#��f�03���4^:�@��j�������5C5����)��5C5����!�������5R�]k�j�3#9�y:Ha��S5�4�f��;3j<i�P
8`���F
�h�P
8`���F
�i�P
8`�Q#�Z3T��KOw�Z�f���!/=3j<k�P
8`��P#^�f��`�2h��<�@�8��!/=3h��f�03���4�<�@��3j�������5C5���C�x���p����F
��f�03���4�<�@��	5��H�'��,7�H���,w�H�7��,��X���yj���5&��������0�f��)��5C5����)��5C5����)��5C5��c@�x���p��g�����Z�f����+j�������5R�Uk�j�F�)����p������������A3��K��Z�f���x��Q�y���G��#������5���2����}DF�1Fd��+�p
8`f��^�i���GC�����1����p#��3�1��KG�K���D$�,�!����y��g_dKD������'��V�����������4U����k'���Opq�� ����;��w��+s�������{ ��)��,Bk���?@��P	��dw>�<Y~9���������A9a�N@�b�+��|�9�
Ov|j������y	��C�k�T��;/!�Ew
�
r�%$2����

w>�@�|�+����R��2����pU
 w^B&��:�r�%d�!��W
 w^B�Y!��|N���m���\������vL�
n:����wP������R����Gk�;�
p��op��LAv���An��Z+�(�;/!���nJ��K�dB^�])��y	�5>A�+(���sP�>�������R��2����pU
 w^B&��:�r�%d��c��DP�Ad���|�9(�s���u2���?�@v|j���@���L�!�5tJ��K�dB^C�@���<�!��|�9( �����p��@���L�C�k�)�;/!�y
w�r�%$2����

w>�'P���L�I)��y	�l�����R��2�!���@���L �5�����������o�����)0M�-@n|&�x�����~x���M��&d�2h�&��gd�32�4��f�2FP���Ac��SP���+2j<��1������Acv��~/�?����;��L�<`��;��^8�;����_�
w�Z�f�5Cn|f�x��!w�`t���Z3���5R�]k��y��Hn|b�X���5�N�r�3���������)��5C�<`��o��1���5���r�3�����������<��p����m�P3��gF�'�r��
5R�Ek��y��qG�x��!w�`���{�r�3#���Ac��S���Q3��gF�g�r�F�)��5C�<`��7d���yj��q5Cn|f��k��;���Ow�Z�f�5Cn|f�x��!w�`t���Z3���5R`�5C�<`f$7>1h,�yj���j�/��OZ3���j���������)��5C�<`����Z���S����0�F4�<�@��	5�H�'�r��
5R�Ek��y��qG�x��!w�`���Z3��f�8�F4�<�@��5W�H�g�r�F�)��5C�<`��Q#vZ3������U��<�@���a7�`�X���4��|������f\=2��G�W�]@F��d����#2���1"��^i��y��8+�������8���Ac�����p#��3�1���N������H�Y6C�o#���������O�]%O~�(��%"v��o7��_[";�h� ����NR�;����KAv���w ����Z+��|�����Jp��ZAv�����7T�!@k���?�@�_N98�l��p�sPN����
�;���B���Z+���'w^B&��:���K�d�]C����y	����������?�;��
!���@���L� �5\������<���N)��y	�,B�k�����PdVy�;��)(`[@d�'2�r�%d�����N�;��@v|j���@���DFv��ZA�������Op�������sP�[@d���
.J��K�d;����R��2Y���pW
 w^B~�O�

w>��O�������@���L� �5\������<���N)��y	�,��GA�G?�GP��Gk�;�
���op��LAv���A~��Z+�*�;/!�yy
�R��2Y����+�;/!@gy�;�
3(���<�E8+�;/!���nJ��K�dB^�])��y	����������?�;��!���@���L�m�Z�E)��y	�l���pS
 w^B&��J��K�Cwr��7	b��S��&� 7>j<��r��
?
�D��&dt2�	4v�f�32����Y3�#(������p��������5�W���y���m��1�����g����g���f&w�`���\`�?��������/o��;O-P3���!7>3j<k��;X0:�H�W�r���)��5C�<`f$7>1h,�yj��Y'����Q�Ik��y��qC�x��!w�`�7d���yj��Yw����A�]k��y��Hn|�RX���B�6M�r�3���������)��5C�<`���F
�i��;X0F�H�������������p��j���r�3��������C�x��!w�`�2h��<�@�8��!7>3h��f������'��;O-P3���!7>3j<k��;X0:�H�W�r�F�)���!w03��4�<�@��	5��H�'�r��
5R�Ek��y��qG�x��!w�`d[c�bv��j��L�P#w�Z�f���	j���������)��5C�<`���F
�i��;X0�H�w�r�3c�Q#w�Z�f����+j��������C�x��!w�`���;�r�FvB�*fw�Z�fb��0h,�yj�aw>�uGF��]3�W���#�����. ���2h��f�}D��Ac�4��<`f��^�i���GC�����1������9���������+�m�|������~�����l�D�J�J�\�'�2QK�6��V����t	Y�:�%��t!{2rk
w c+=]�������2������H(T����.dOFB��A������.�� F�$����.dOFBn��d�
U:6]Bn��dl��AO����JO"D���LA�J?.���R�����.���=�����.8P���LA�JO<(`EO���a����'S@P����tl���Z� ��tal�� ��tal�� ��t!�6�d

V�q�eEBj]@Z�~��C�����NvZ�6J!h���.Z�6J!h���^���J!h������;�'����N���I!h���.x�=�B�
�;]�8^A���8d�*h��t	�u!m�;]�.���ZCP�p��
��� ��t��<Z#�;]����HA��N���) (@��y�	
��h�����.����6HA��N( �R@P�p��tl�������~�( ��Q@P�p����yH��5�;]���B�
�:L8���1Qbq�:_Yp�=)c��8�v5}�DdT�D����n��~BFeL�xFF?k�8#�2&J� cT����f�CDK�:_YWdT�D�7dd�:_��`�P`�,vu����tV�D���T���J���Y%�TaW�+tl:`f(1jF������Y�1Qb�������Y�1Q�5#��|%�feL�5#��t��M�%F��:_YP3�2&J��vu���fVeL�5#��|eG��G05#��|%�fVeL�5#��t��M�%V�:���X0#\������)��#\���J����Pb�pa.�+tl:`f(1F�0���#�)����\�WF�S6B�1�����d�`(0F�0������F�`�pa.�+tl:`f(1F�0���UM\�j���j�r�o:6�`(0jF����G��h#�8�f������������a.�+j���P��a.�+j�c�]�f��������v���r��k���f������������a.�+j&���f����l�����J���r���f�.�+�����f��t��Pb�pa�++Fx�:�
�Vp��0�#�U0F���������1������:������
>��������!�����;2b�]{d\�ft����2��}@F,�����>j����
LVq������4#��F�k�f,�X0].>6�_����������o������M�����<}�o������=L�OV���
��8�5t������3�J�_0�����p��o�]����o���������7�w��A������/��'��@���%���?�������������������f~��o�+�����S���9���~�?o����?>���?�O�~��o����s������o����?w������|z�b��������y��~]�Lx�z�O����?���-������?<����=���������������O�������m4Tg�����~����>�Q�~}����>��f}����N���k����!�������B8yf�^'O��5^<��^��c�!e����9�xPG,j������h�����Bp��X~s�4m�>���O���7N�y?MK���u��)���WP\,�K�=�	�������������������>�_�(
�}���?���}�/]�~��g~]E`�=\>�/���_]���������9�s�}]3O!����\e���?�t���	��������4��o���C��������X�+��^�uy��otN���:�����������!^��eB�����s_>w�o{�� ������d��R��4��k������������:$����:�?F��5������B�G���uJ��������CR�x�u=r_����r��[�`�~8�����1V��F�����o�! g�*����s����J��F�9��Aw�������U���J�q�������%}1�y�}���|}����Ir�n���^�����������I�}���yX�`�$���%?�P���x��������ts�n��s�?�_
U��3���X�Sv������9���/�p�n����/�����q��qs�ns�v�w�x����jf��e��]��3��ON�g��7�H��N��6GlL�-�_�	*�����W�q�����m	W�?���6�}��������]e}��B[�u�w�d�H�U���h���@C�:�����so�����M�p�<_o3�}���ye}~���nu�������O����7�1 uW�_,zJ���6�f��/G������njw�������/5�XSw���6�{-��=�[��*�����9���>n1�>���j���x������x���wY�`���I�t�o(iW�M_�����f�M�?Y2s
uj�dyxXw���8`����5eWY_o#����H�U��!.d�����o�$Rv��
�1�X���W�����O�������<��I�����g�)i7o��:��z�?����_e�	0��u�;�q/��������4u^��s�?����w�o.;�2/�V�{0�	��>�2�K�2����C���{���������3�^1��T;gM�z�����]����Uq<���k���N������M�f���
���������c�����^�ax���a�"�x��gJ��i�F��n]�]�>����"�W��>��2I�*���{�]{���:��-����e?	�n_�����y�.	�S?��c��w�m��r����v�C)�7��0�������?�O9�Im^.���|�rzp�d��]��f�*���'m���u��9��aaO����
+�x>���q����p�,�>���w<���8]<�c��#�\>�S8�74���t�E�f�ZY��-u3�u�"J�M���v��0������3�v_����d�Wy�
_�"m�i��������H�'���'��$>�Y��=�SD��)���P��jJ���)m?�:<�����h;�+���u�c��y�}�mN�M���A�H�U���������T@X��#
�������f�4�����g��I��N�|;Po��u���A�s���A="�7�������i62��#���:�����u^��X�o���6�3x������3x��w��g��{�:cr�+��w����MM������Sw�[~Z���e|������9�6�aA��:�]<���u��YK-��/7��o����3���|���TA��&w�n]<gr�m]�����g�m��>#7����7���&��gv)m��^��"���W�=�������w��lW��s��Y+/���2h�����+�D�r�Z69����;�9�����"\<�W�A��Y�N��~(�/���;e���v��P�B�!��;mj��I��LFMI��C�����Yk.������������B$�����\�q��v��w�p����1is�/���a�.tI���;��)iW�G$���
��������P��4���{�N��^��YsIK�[���uS�U��9��m���%��=��3=����gz91�}�3�}���{w��O5�LoQ�]��{�>��f�����'��P8���Oz��kJ�����x�zD����4���j k�e���(�\q�����Y�t{�D��3��z&N�g���/�������z���"����N��4~�n,�l��9ruN�mj��'u���r���[��$n���$���:/Y�
����DD?e���uz���$~�n�?�=<���;�j�`���U$�G�<%�&�y��)����c�GP��Q;�c�N��V�;��y�:����������fz7q��t�U�\����.#����]���������%�_G����]j�s�gd>��N�fP>(��1�	����Q�K��F��{�����Im�1�SV���G8 ��za�����w��v"�7��
���(���{�;����EJ_��Qz��%m�d�"��g�'_6c��q/��S���b�~�5��������=�i�����OMi�A="�7��K���
�:��
�-�O��Vh�y����c�������q�-���[���w��YZ��������*���]����y�]�6�K)�y��#�4�nt�}�������G�f���c_#5�b_%5m�*�o���H$��o��2O���w���-��jO���B�����������'��1�#����:�����3�m���?�tT}���O�����N��������*�CA����� y�~r����N�;����uHx�c����3�������������\g���3���xfJ�-fs-�xfg~��Iz�Kj��)U�c���C �S�7��K"I��_
c;1��Q�H3����*��a�z8��Cv��R�2-Z9�B��V��~tq��1?�v:���z������u����B�M�_�z�p�0����5�9;,����*���]�>�G�t�S��'�9�������,2v��wo�p�_��O�����{��t�-������1;e����x��-f���
f����s�iE�����;�w�|���i�D�1I�I-������F�)[�~1�����f����x���OxN�pZ+�����[�y�T�U["gW�?nn��U�����Z)g7�����S�i���2��l2�Oqh08��y�r��U�b��5������D�nS[�U�����������x����=p��}��?�2u����������<d�I��`R�n��y�����<#����{s���5q�H_�$���������]��E-w��Xpb���R��	���O������A�>Y�x=�oC�<�m�,v�Av�5�����l�A�kA����4�b��j����?;���S:�{�+S���]#�a���E�}��|I��2��������%�i�b�P{t���'�[D+S��@���#��u�y��
����aJ�5���~�&�����-�����W����k��n�k�~UU�i�7�tv��}�:��}��[������������"U�"���V��>	���d]��!�9Y���o���d�z��E}
�w�����%���i�J����s���B�����������'������/���xJ�n������0���w��J},y4�������1nH�P��}��`~7��S��E��������&_�Vo�^0��4�!�o��C��R�&��)_��n�YH����"w���k������mj�Z���e��#"w����u%Z�h��Xe����������=�+G�D�p���������Xk���B�������w�H�5�������k���D�V�U���[��������1���s�M��s�Dzn��q)vC���w4��3MZ��-s������&�y)v�������'k�����C�4]��0�C�4]�~�e%�H��w�B��v�Xk�wH<�G�3����H����V���O��
�Rv]���+E����3S�n1�Sv��\����);Vs�E�
�[�5��wc�O������#��qT�y��Mm=����3��1�u����x�H���F��1�4n���*����`��=����t�y���&��[����4�j-����J{|�Zk�����p��R	�s�`��C:w�����`8�����x��zq���'=b�9����
%����lN���9��&�y������O�:M���kec�������"M��{��i�}��o���i��!}o�p�{si�~��v�L�����0�P��4�8"M���i��l.?s���+��������X�wo��
�����8�p}B�[�@�T���~1Or�����|�C�C�O��~8��������O��x4p�{D�L|83e���������O�:��j	��p��e<���	K��Jo�>N������	wt���Q�H�m���#�_�����%�;��;2~f�������$�������l��ab��pfJ�-f�A�U����G���z~�}�V�X#�������*�����p��0j1�2�:<�1��=F9�������o{NK|������8��k���9F$��wL	�u��e$���	:������UZ��X��������S�)���U������
��O��+�N:����w���H$�*��a9'�f��jh]e���S��2��_��St����&Mf��w��p�D��i����&���"��U�w��H�5��!�����.��I�*��ju���
��i����N��Z��|���&���F���D�����#��~�/�k���nU6����[���Mf�Y���y{���h�|��~7�y{4�����������w���c���Q�G��H�����G����OI��;���LI��lv#��3%�n�F����v�����b)��^%��_�/�Cz�2����X�'�P[�}N���w�P[�}N���4mj��f��Z�LFG�x���1U�#��g���T��s��c����4O���	��s�V�g��u���N�u�������7'�����������}[���D$����������3-����y�����AF����������3'���a�Q&�wo��L�5�QI{t���=�O�����I|������p�\��NI�N=$i���=���v��<e���v��xG��*%F��O���yS��z�������:�a����������e�2m�{����~�z��������������1�kcr��y��m�>�[7n���}�Rw��\���6��qq��^;����k����S��k��{��v��eP�n��=m���>E������>��o�x��@�����"F�5�!�v��,�=s�OF�����*�A�mQ��?���r�m���q�������=��{���$\<��Oz�z\�@��2e�;�<��Q(�iH�������u�����.����<�N�&�*�Qb=l����������r�/��t�a�=�@$����x��[�����^�>�Q��f�t������1�X�Qv�W�k���U#u�w���(����l�M��9��F��O}���V����y:��zT��NA�������]99]����9���Lgi�~O�t{��O��RA�T����RQ���M/=Ms��9���Gx���9���,|�=s
N����$�����I���#Umv�\�a�>�l���;k��}�n=�=�z����������}�f��4h��[�C��pj�KO�i|<ux��?���y���#��J������Q�q����X���s�j���O���3��>	���'��Y>
{`.��)f���S���[����Ss�NMS�U��Y=����!^?��Z;d����>��e�nq�K����s�=>�Od��'��gi=xx|�@��)2i���������Ss�N�c����a�������\��1PVy_KD�nr�����=�m�,���b���'���2��,{���w�H�'7n<���I��eJ�Mj�<w�z�<w����8�����Fw\��w���Z�c.�����}=Z��-Sv��}�.Wg�m��{|���g�n�|���Rv��zx�Y�X�'�1�OM#�&��g�]s�N�.�}��yq��!se��r��L�-��C��{��u8'}b��{|�;���:���������W�tJ�	
�j��=
���9e������=�/<�"v��wG����s�d�wT�nqwO]�����E��#����L��>����}b��'����bO�5�c����c�59d����s�~�<�O�!��k�c�v���&�����9�6[�'����xw�w ���>	w=K{��'��Gw�L��o�G�5j��$��x���E���w�����S"-�S"�\�2�NVy�eW���>����c�I�����>1o����E���YZ�D&��N�����I�X��449b�=����F������LI{<5��{�����V�w���U����\d�`������G*��\�����a�-Qds���_���������-P��S=������������"�����:i�]%�!`��G��M����tEm>1��4����;�����X���>*����m�*~�r���6�o����g�z�F���y�=��G�uM���J=b�d�=s�OF������ui���*������w�����U�_7-�M��>1�E���y~��O�I��'#6��)2i�;����J=���:�i�3'��}�����:��v�]_'�m��w��"����������������+D�n�w��H�'}b^Ir�'��$�������NI��S�s�U�1s���)m�������0jw�G���v���.�������L�-����}���hn��G�I��Y(�ir�g�;f���2m�;��w���h�I�F<f���~6�E3�u�����fw�������wz�����v�^�������G�5���/_���k��m.�?�6/)��z���{��>O>�a���� ���"����S�
�*��*�{�O���Z���w$��V.fN�5��q�=a��
H���[&���2a���p��4/)�'����b�������Cv��^�3��S����=��]>�����C�l���k�?M���G��[��k�e��'2i����	�I��7|=���9'�bO��E$�:���������5'��wM�Uj����?�����Ja�y7�*�1l>������O��{t�o���s���e��{|���)�����
��W�G�E��15��������������v?�	{�U��v���S�9'2a��{�;d��q�x21�'2a���<�>�6��O��w���aZE&���"v�z��S��i�A�b]7zy�,v�z@����#]o���^&UV�������M^�s��q�t=���]�����]���[�	�=]��)�*�����'���`�v��n;6;��%>�3�tlw��-��^)�1�cRe�a��d]�~������=Y���;r���y���g/����?���m�����G�>8���p�������l�}�z��H��GLd���y���������n�;rw���g��5��o���&w�[��k��V�5�{�N�'}b����O����������a��&���v�X7
�RI�5�1��F�3���3����>��d������%U�cr�\Ly�mN���[$���-��p�p��4��:�[&�r�B��7N�"Uj����u���;��kJ����F�����c�q�����i���?�!�v��e@�}�2m���i{t��>�x��������{�nq���#�vC�������=����x�pEm����f��#�wW-�1?Vy��d�ns��.ir�H���D���}�x��}�z������"�v�S����&GlX��I��{�����_>���^�������Z�c.l��o�,Sv��>�ns��O������w��Y�'HF�7���S���7�)�Nm�Q<�nQ��y>�Jm9
r>'��������{�����:���*��am�L�mn�����d8(>>��e�����Y}�'����xrN��N���W$��&�L���Q�pje�����]����\��1?Vy�������8{�}��=�[$�������2��,����w�H��o��vC�2O�T�-����{���&�}b������GxG�^��:������<���&w��o2e��{��d�n��y�}�m^�w�'������9	{�(���J=b�_��GL���gN����v��$��=X3�]���\Y��n�&�	����?�L���6O���	{t�����y#��q	{t���=�a��]�6$N�u���OM3�����d��gN�-�cl_-�1�AVy�s��5�M�{�V�?nK��[��v����ir�����<���^�y�������i��E��v�X��449buv��~jB�z�������=��]>�#�0U��G�Wy���my:W�,W���i���~�����
�W��(x����gq���?�w,(Nw?a��O1\<�������l�FK��~?�'N�����y6��rNo����NuW�
#���y����$uK�}�Pzo����O�.����pGz_��=���*�w�fbT>�[���s��q�Q�p���g�,����4*�������&u�&U4*oR��e6=��9m�kw�G��������*��S���2m��X
d�sn���L��>y��)u�'���f�<���:��i��)�A(�v�S��)�����Im����[�v��J=�L����#i��#(��
����>n��Gs��]������I��m^e2�O��%$���"�v�S��4�G��O�I{85'����a����H��vd��j���}�qy�m>a<w|\���=����e��Y��s�����8�����=���V�-KB�G��u�4��w����|�Je���QH���-o.X��c��WY/�c�!��^{�U���w�)<���
�+�W��~�����i����^(\>�c���������������{��0jOn�^�_���i��������z4���o�w��?�;��R�y2xTY�oI�s�_������r�������z2������_:?>�s�����y��~�� �U�c�a������]���eZ����%A����1�����[B��[�o�<>���>�Qy|��Bu��"nU����w����m��������2������zc���������w�x��3�������4����x������?n]cu��_���
��i�F����=���z���Mj�Z�������/�KI���{�V�s��������xq����;G���E�2�W���Z~���_���-��S=W�!A�O��i9��c�e(�9��5���9�������[����
��)7�G��lRw����x�]���C�r���k�/i�sQ�r{��L�u�w�����D����~{oYP�x�o��yoi|�S�4~FoO����}~�q���zL��_�9����q�p1��:��aNd:�2NoQ�W��S_��e�+S�{�����G�^�.��3���a�+�7��[������F�s�9�uh~�3�{1��T{?�DN?Ik��y���-�GU��-;2��������a��qxg��;��qG�^+�S�����X�M	V��:��������F���k����k���{�\/�T��/����h
�w��%�d�>b���7�}������Oa�Yi����L5n�!����@�wL�,��p�������O1>w�"+�����%��%�I��B?o��wa��~=g��s�b�e�����%e��9�{��l���������S}3
�/�A����������Uv�JN����,K�������Q���	���~-�;��c�
��4�����rJ�+�@���z��RVVy���gz3
����>[���G���.-���e8�G��?���[{�eyl^
K4��]�~��=9qW�?��X�An>P��{Ce��b=�1�v�������kql����+����wO���������U�_��r��B�z#��F}���w��p�@g��wO���n�v�^e�0��2���������w{yk�����������Le�o������a��{�mq~�|���P[�����T��T��
K��$�y��'��fM��m�D��Qk���@']b=��$F��^T�?��\������9�|�E���g^4���}ua#{��#�L���z��������������:�`�F��F����,w���O��:�pJ�zS�O���
���j�p�@?����W���-r�}�n^������<��k��^��d6���`����.��+��3��'���?�����k
endstream
endobj
8
0
obj
46799
endobj
9
0
obj
[
]
endobj
13
0
obj
<<
/Type
/Page
/Parent
1
0
R
/MediaBox
[
0
0
612
792
]
/Contents
14
0
R
/Resources
15
0
R
/Annots
17
0
R
/Group
<<
/S
/Transparency
/CS
/DeviceRGB
>>
>>
endobj
14
0
obj
<<
/Filter
/FlateDecode
/Length
16
0
R
>>
stream
x����n$K����w�������Sh}q�,�/�H���f�fQE�G���_�yW�[���J�������������q_��������?�F��������?�j�q���'������g����������r����n|�u���?�������^���\F�?�c�������������������G���?����j���K���������y��:?��}�w?�|5�������[�<����:j���^�����m��(��Z���������G������N�����u������W�+}��uj���+�]fy����l�%�E������������^f<��/c��e�����kz�����o?�����uA�im���zgk�>z�����8�-/$������d�Y^rj���D�nOh�'����>:|�_��Zz'G����7�x�����c����ZLm��;{�����{��ir���,������}��U[,��Z����T�6l�6\�k�sn(o�����������xt�f���NT�	|��&��@��~d�n��m��]|u�Jgm���{=���a����,/9�}3�l�a�:���z��|F9rUF�m�h���o�~:�l�m~�h����<�|��>���#���k=Y-g������Ot��GoiW�wN]����h���E7�5��������v����{�O5��Z����C��I�wN-�el�S�I�������8���^Q��t����S�o=�f�O���}�=��?�w?7����	����0'\���<�	t���u�����G��W�&?[F�i]�v]��s���dj��Z�Y^�=�����}����)��.k��w��gu����E��l��;��_�,�v��n��lz���y�+Q���!<n�����S��y�����;������/�yf����S�b��6�����m1�hf�Y^rj����-0�����;[�/�fC����2m����<�f#�mD��H�i�������C����Qt����O��'U�����q@S��|y������S�w�P�ip�o��%���*�RN-�Ta�s�(%����yX
���k��pz�n�M6���a_�DC
��z���
iD�'Z<�h���X�����W6�{��X�<�����v��hpC�f�`{�
~���]tpnN��S�g���bX�������a�:�rN��
������;��_�"/%�zgo!�L�W��C;�^E^?�7{"��'�?a��g�~|u��������f�O�t}�����kmv~�����Z����Nm��������9�{�+�h��q���wr�� l.�zl����o6l�	���
�������-��K�yi6/����'0~%z���!-M��0�}�i �qi����
|��S����N��������)�]�1-|[������>t��i��`Hb������m����y�`�s�4���96�68��������Ul���:��o_��6n����b��oo�u��ujW�w��)�X�P���/��6����.����e���N�ki�w��.�G���6��iS�w��p��m/��o�G��m��o��i6;�����m����3��_t�1tb���5)�xH�V���������*{��tZ��M��y�����U���-'�on�v��<4%������,�o��%����DjT���[����������}��R�Yj6K��,�o����o�wk��f���K�Yg�������'�<��f��uj���9w}��S�����{�������
��Z�����?�����b6X�����v@���F���4��rd8�������������vj@�p�T��z�S�R~?{��?������g���:����X��]�?����K_�^������F�Y^��Mox�����3�0�bdO
m[�*/Y�W�0`.OMn��.X�|j����a�I�6I�&��������6n���#��W��=�CX�\��]O�K�y�@9�N���������r\5Q�.����q���sx��V��i;�w~��U�<t`+����;��av�a�6,������}X�l5��'9N��CX��%/�6W�7��s:�KrF��M��Y��7���e�KOr�-8N�������u�-8s|��X�����������[�v�zg����V����t�f��^��=��ck���V���8A��;c��G���F��r�6�N�:��Z�����a�-8������bo���l���
6X��?{L5�Xa�n/�Z��/��Y�%/2"��te������g�������N��������e��S���:�KN]�������������Y'�706�	���?yL�F���t�f#�^����I����`^U�����Uw�6�d 
fdd��(v�H�����S�o���!����X��y���`6��T���5�6{!����^�f/d�_�o��U�qN��u�f���������S�b�s�t�a?���N��������[pu��X�����q��6w������VJ���62�F����}d�����7]����5������yR�����.��9�q�m�����O���X�W/�L7k�<usfc����E�6��a[�����?y>��I�9O���������<O�8t����cl�'Kw
��;����'o��[t3Y3�I�w2e�T�N�.�o||75�k�;�pjC�w>��	`��`�M�7��cZfdC����������_�5r�t7�M�E���?y"�
G���6�������g���qi��pI���z����[��!�����/
��u��M1��F��d���M�z61[�F��
F�w����$]����X�l�����g��6'������}N�rL=��q�X��4�Cc������c8�'�i�]�v�z��>��i���������^}�%6��Y>���f��}>_���j�������o���M�o���6>���}9>O^t�yN������)Z�wor�w��k;����X��m�|q9���-8_��������-8���8h]^����K6��j��X���cj6$�����4�����"�[��9�5��8'��;c�7��Xi�]�v]��s�w����l�8$gy����l�����/�@?81�0uy��=�f�Y����k��5k/_�oNw�Ef��4+E���������:����X������XoGl.��o���M������I�Eh_�w���q���
[t{�����|���E�Yn�o\"�����B����S�b�s��n0���Pu0��ry��zh/}9|���X��}���-����4����~�5�n�m��
P{=@�~?k��?�wh��q���O��
�5������T��>[��F������%�S��94����G�!�(�-��JX��I���?�,��y���/��j�}���zg��~O��OM�����=������H�/��l�5?�x�]������Jv{%������Jr"����UcZg��$#��Z<|��;����}��i�}fy��=�f�^�go�}j����
O�f����j�����>)o~���u��]�U�����A�r[�	����xo�ILp�T�.�
�8�����]���O}�wZ��}��Y�������3����/Q��8�������Oy�\�-�qb��dk�e~���q�������%�������M���66�������A�����5%�����w1�o1]s��;�/�;���59��/xjX�q�Os5(f��`x��}J9���/xj2��'���t��n��lZ�>-��?�&�s0��Cq�zg����@_�D���Y�,}�
Zp$�
�g���?4��������Y���C�� q����8�������_���yHm��������L���K�}��3�K�����7F����G�K��G������v{q�������#�e����-�������OR����S�b�s��:��xsAp� ug}h�;�]9���v�zg�?�|v�#�rJ�Nm�dug{���u�������4���rzR��h��l��d����v�xw����7���wj_�wN}��,ZpF%�M���d{h_��p
%����.���������h�!��7�`��'�g�Q�6*�F�����������H��8���4w��I�zX�i-���z�����-w	y�/xJ�������������7�`��'O(�K��h��5�zg{`����o�C��y�n���k�QJ��G���������<$���~��Sb�h�������i����	��(�lti���	~���yH:�8�x�0�;y��e%����]Y��;��Q���Fp�[,/Y��>�]�I&]�=���'��sQ�e����x�>��;_�n>���z'����Nm���o���yfj_��d���,/�����?^��^.:[;a���N���L9�X���Ob���63�f����}f���TI�����,�w���
>Ur�h��{�%��r�h��J�X�\��;7{*��TI��A����O��WkH��8�D.0X���~�Z����U�9:�����o�vw���7�r($�M}fy����7Z��bH��X��->S��	�����7��o	vw���h�}�!���w��~1	�>���W�+��?{oF>�F�8zV^�g-��������+����|����5�����}�q.���������-M�8V^�������}.^��M3����Or���T�ik��m�Z|���sSd���4����Wi2��w
�7��6����4�}9#Eg�i��'/!_uM���s��/9�X�Z��?!�T�,�aM�5]<4r�X�~u�ON��r�
��m�]�}Mc�:�:���P�m�CC��S�7��l�y���X�����������������F<4��X�~�[�542{8M������#������XH�����r�~c���/��<l�O_{��A����m<��^�������8V^�g/^g9
�8���h��������[�?�]}fy����n���bq95�s����E�~)^�k���.f�����^����v7��2��=�^�8&V^�9�|��gy�-_6����,/Y�~�W��J�}�+�h��=Xj�����ea���hX��Xy����?��/�G��R4���|�~��p)���`�n\�������h;����X��m���<�r�4��>������D�;�w��6.�3�f�d�W��+���l�+�����P��4,8����/O��F_�J����X������}^'��Ej���8xU^��>�fOs�������in/��������T��xq+�����X��/��\/�N���;{��n-�/��M����,���~�����}���=�z������l�!�T���9����^�f���t��f��^�N���������'z�����;�}9��pZ��
��95���Y����#��g���}����M
M������n�M9�|g��X�,M�^
��[�G�4���m��MR�Ij�'��w�zi��pj�S;X�|�QYR����0N`��Z�e��qf�i������S?����V���_9w�v��l��o�l���|���;��mG�����`nlm���x�8>�^�����o���}�����������?��-_L����o����O�l�N��:5�����/����H����p�d�w��Or$������\��s�o
,55�c��������~{X��l��m�z����f/Z������Ek_�h|������\�@�;���_����N�������o�	�E�����a�����&�"�@����V�
��������4s�$�;_��}�]��4tc[p��m��;A�n5���.
����?yF��h�^�n/Z��}���u�8��48����]�����h;���������[����

���,/��O��*kn���9���;y���o��<4<�������k��pz�0X����	��c|�Oa�T������S�wF�`�E74s����)��	��a�~jb���/���Ue�:����k�s�������[,�Y^���zFn
��p1����L~7�d�n�`�W��+��Wp=�����h��p�)�z��o���N�;�/�;k�������'��)�����~w��}9=���8U^�=���)6{u����^�f�n�_]���&Z/��������;9��y�������sjO�w�����s4���Y�h2�K.�c,����h�������q�;w��J7����r"�#�=�%��l���7:6�61�&����}b��(�[�it��8e���k���:������/�;���f-�l}��r���R1H�>���n<����7v��ij'�w~��~~`������%���Q���F|���@�%���<���:u{����Z<�x~��k��pL�85Ys�z���)�O�D�i]�v]�����}k������,��}��p��8
�����qj����O2D#?v]<N���s�7��E���6=��������k��=��h��������;����X��}��A.��D�1�h3�K>�'��.�<1[�!���o��	4�r��t��L�%����J�Y�6+�f�����?��������?����q|���N�����<������������S��8�Say/,�%(���ge��(��%,��Y�P�|����������������|���U�t�T^��`���f�@��T�'��d��3T�@��T>o���e�!��@	�>��y����)_9g�+�G*/'������MV�R9Sy��=_>�#��QKS�{�Y>8u��sf4����S�|�Q^�<gJX����S�|���<gFg��Z8u��Oy�sfT�����{�|�G�3����N���S~�93i����{��(��
�����]�������93������{�|�{�3�����c�����|�93�i���%�+��(�_e}
����g!�r��������4u�_��_>�5�������k�������93�i��?��+�|�g�3������z����r�]��@	��}r�������I�\�HS��=�_��S����4u�l�_>�#��QKS����_��S~�93i��?n�+��(��'x
����g!���[��r�3M��w~��Oy�sfT��=���_>�#��QKS����W|���y��F���.�����r���)P�����:�y��;��r�3M�ss�_��S����4uw���f�G��q(�4u����.�G����@���}b�x��#n��������������\�t�T^�kcyo�@��T�;�{g���@��T>�'��7vy
��S����+�|��������������z���N��`y;X>�\���T>N�������������7�
��;M�ur��A�#>����4uW�����G|ds(�4uW���C�G|�o(�4u����S�G|�l(�4u���������S�t�i�����G�����*W:������'�����*P*i�J����+G�_J=M]��:|f����O�R�i������#���r�#M�sl�W|���<gFW��Z9u8|�Y�U�T��������#N������NN�+q|�)P�3M];8u8�v���U�t��k�'��8d�
��4u�r�p���S�@���k�S�sG�5�������:t8bw�)Pjw��~r���}��*W:�������; �@�����S�=�#R�U��������&��*Pi����CpD��(�;M�89u�P�|Y���c����s+���<^!,./"�"*�,Rm�<������j!��Um���,��
x53d���]�V��]�
�|q��m��j�j�&�0���f_��L����{-D&��
##����	����3db�����H�?��
��q�8p��;6s5��e��9�� ����j�H��\�(
Xd�Os�7�k2����$N0����c3 ��[F�q���	R���6���/q9`��w��H�C�dU2��rq@��$y�6���{�C���
#��rq@?�H� fx���	�X��w���C���;VsRA��2�����P�bHW�a��\.�8��}�����!�d��9i�`��������"���G��2��Ls���"��x���!������oHGz���%�U��E����2���!��<����tS0��������u
��-�q�%$����E{�
#�r��
��"�q<�KHEC��y�y	�`������i)������>0�(�G�a�N�<; ��������QL��"vU�U�
�jb��Xm"��T�fb��X�"6�T�nbs����q?����ryv@�Z}�5E���&v�"v�"Vo�6�nk���C��!���0�q��_<��o�jC&�\~��-��?�j��q�\��.,2���9��`�5y�b@�+bCF�����
��-#�8���)�[O=���s�8�7:[Xd�Os�a�k2�����X0����c7 ;�o�TF�����e�%#�x��+�X��w��H�C���;Vs2g��2����Z�b���SmY4����������a@:-bUF��2 ��.#�X�H�C���w������k�j�H��\�r-,2���9�`�5y�b@�-bCF������-#�8�H�)�c�O�a��\.���EF��)���f��\�'��W��������8�T2%��*b+�^�X|UFJ��-�!�^M���sy�|k�jC��\>Dl��>�������\0���jC��\~��
�"_����c�J�C�"f���I�
H2�b��gD���V9����4�v./�p�y��/!B�b�a^B�.H1���0�v./!B1��i^B�.bM��X�K��Cl�y�y	y�`��b�a^B�.H1���0�v,��K���U����������]F���������wd�����6��W�a��\.��8;�s�v��2���9y�`�uy�j@�.bSF�����R��6������ `���a@�.bMF���������wl����!v��;2o�rq�����W�a��\.��8�8�OsRr�k2�����\0����c3 %�[F�q���R���6������)@,1����\0����c5 %�)#���H�!�X��6L)9�g0�^E��bbW���X�RmxU�M�j�����6�]�z�����w�C�fvc�Um)9�g0�^S��ib�-b�-b��j��6�v�X;DlRm����S��S�
�ib�%b|��@����0Rr.\�8�8�OsRr�k2��L��A�_v��_�jfc����K_H7��v���<����/]�x	��6�����K�/!`���i^B�.bM��X�K��Cl�y�y	y� ���j����\��x�y��v��b�����]0������w���!6e��9y� ���j����\�qv�����!Ve�/s�v��2�����]0�ny�a@�.H1���0�v.t~�x���;2o�rq@/���6,����!6d��9y�`��2�����]�b�W�a��\.�8;������F���$v��
��sy��g�jC��\^ElE���j�����e1D�����!�v.�"�"t`�Rm���������TV{5�����Rm����o[�B���0�v,_y�`��C���s�)b+��S�
��s�8�7p��;6sRr��e���s�8��B����0Rr. �y����\0����c1 %�!#���H�C���w����C,��
#%��K��Xe�/sRr��2�����\0����c7 %�b�Um)9���!@,1����\0�����e@J.b]F���������w�����C,��
#%�rq�8���!fx����X��w,����!6d��9)�`��2�����\�b��W�a��\.��8�8�OsRr�k2�����\0����c3 %�[F�q���R���6������)@,1����\0����c5 %�)#���H�!�X��6L)9�g0�^E��bbW���X�RmxU�M�j�����6�]�z�����w�C�fvc�Um)9�g0�^S��ib�-b�-b��j��6�v�X;DlRm����S��S�
�ib�%b�8���6�������,2��L��\pqbqTs�m�2����l[0�ny�a@�-H1����0�m.�S�0b��9��`�5y�b@�-bSF����mRa��6�l����������a@�-bUF���6��j �F�a5 ��)#�����)�0{UF�����v�fC��0 ��*#�x��m�X��w��d��!v��;s�mA�!�^���ms�8�����3<���C���;2��rq@o����6l�d��!v��;s�mA�!�^���ms�8`����3d���=WF�����������a��J+ ���%+1�^���LQ��?������F2�t�t_��g/�ZVZIJ{Vb��x�"���qd������LQ��o�1���d�R�|o����a���db�����Z?0���LQ�xf%���G�L��(u��i��;O&�i�:������z��D.�J7��9���'�4J�dF�����D"�R�<��0/y2G��1Of����<���Q��'3�����'A�*u��dF�����D
�R�<��*/���4J��oD��g�q��(u�3y��O��K����0y��g�3J��G����G��R�<�#/�y��9��1�xd��g�q�(u�3����3��y�n�g<���W�qD�(5���Gt���G��R�<��/y�2��1����d�y��0��1�x$���&%O&�U�a��Hx�<���Q��'3���#O&�`�:���lw��'A0J�dF���}��D
�J7������'0J
�V��.�y2����1Of���g�L��(u��a�s8�8�d2�}JwL�����W�JW�R�����J�f�Z���e��T��C�|Q���F��x��4��:��4e�W��K���[�Ft���Y��R���Z���zg�zK�a�1}
m�#+�CJ����v�}f���R�++�:���g��*�0�xd��K�q�(u�3�gny�����1�x����<�HaQ��'3R����'�*�0OfD��K�L��(u��y���'�+J
���S��x��D��R�<���>����X�a���Q_y2����1Ofd��k�L�(u�����'i+J�dFz�x�w_D�(u����>�$���X�a���M_y����1�x���k�q$�(u�3����g�*J
�V��.���#[E�c���J������
��GP���G��R�<���.ny����1�xD��G�q��(u�3���;�8�T�:��p�a�u�L,�0��%�+8�%��y�A���y$��$"��E|C��\^Dl��@�����I+�W[��B&�O�a��\��"T B�%f���������tU2r��!b+H"*]��L]�|���R�HKW�a����aA�10}�
#{��u~X0���.1C��\~��
U���jC&�\.��`���wl���[F�q�p~X�bLb�j�u�\��Xd�Os��X��w,���!#���8?,b����0��� ��>����b�%@r�2���9��C���;Vs�����w����C���0�^.t��?��v�2��rq@���jW��e��a��2�������!6e��9��)����6������0�]b��9��C���;s�����wl���[F�q�p~X�b80��
#
�rq���������!�s�8`q#�UmX�8?,bCF���p~X0�ny�a��aA�1~�
#f�rq�}�/1������!�e��9��Cl��;vs�B,r���d+�����$�"b�G1�����DZ��
<sy��D�7�6���z��Elv�6��������Sm	4�g0W^S��ib�-b�-b��j��6�v�X;DlRm����S��S�
�ib�%b�8�)�Sm�4���S� y����1d�����4���&��aT6s���-#�8�8?,H1��O�a$�\.(�8�	�3<�8?,bMF���p~X0����c7��� �U?���zs�8����K��0���`�Uy������!��zU2��rq@��������p~X�bL��j���\�q��%fx�p~X0�����e��a��2�������!v��;s�R�A�Sm�8���)`���Os��X��w,��1$�����8���0�^�������c��TFD����q�m/1C��������Ob�W�l������8�T2���*b�f1����0�m.oYq5�j"f�l������X�T2���!b+��!���^Md��!6�T2���[�V\�a��6�l�W�-b�1Cf�\~�������jCf�\.@\
2����l[0�ny�a@�-H1����0�m. �y���l[0����c1 ��!#�����C���w�d��C���
#���K��Xe�/s�m��2����l[0����c7 ����Um�6��:���d��ms�8�_����6����C���;Vs�m��2����l[�b�W�ad�\.�8a6�s�m�k2����l[0����c3 ��[F�q��mRa��6�l������yGf�\.�E�0����l[0����c3 ��[F�q��mRa��6�l����)@�
1���l[0����c5 ��)#�����!�0��6L�6�g0�^E��bbW1��?X�T2���&b��XoRmX���.b����Rm����!b3;�a��6�l���W�)b��4�����zK��u�X;D�"6�6l���S��)b�)���4���[�0{UF������ ����i@�-bMF���6���&@��j�f@�-b����0 ����Um�6���)@�
1���l[0����c1 ��)#�����)�0{UF�����z�fC��0 ��*#�x��m�X��wd��������Qm����)�0{UF�����v�fC��0 ��*#�x��m�X��w��d��!v��;s�mA�!�^���ms�8�����3<���C���;s�m�2�����\��8a6�
�9�� �f�j����\0Nq�l�2�������?��7�q�f1���E�z11���U��m��K� �7lY,��^p0cT;v�s��f�j�!b�^M�.C0������u3F��-bqn����f������8��.�`��v<E,�m�C��5�eh�Q�u���Ez�jG�t����h�zC�t�z](2�H�Q�(C��Z�M��5�eh�Q�ua��"�F��m:j�.�2�H�Q�(C��Z?p?
AV;^2�����Peh�^��Q�6�^��m#�F���m����� �F��8 �~.�����P��Z�8��1������P�H�Q�(HG���.@z�jGq@:j�.Lq�kT;��Q��n�!��
�����p��^�����Q�u���^��Q��Z�C������t�z]��H�Q�(HG����� �7��M����&C����2��t��0dh8��Q�6��^nZ��v��M����� �7��M����"C����U�6��^�-gT;���������E��jG�t:����!����������e�j�*bq:z]�1��Q��D�6�M�z�j�.b����"6�T;����C��f�����5E��R�x��u�X�E��Rm�k���C��!��������S��S�/���_� �7����B pF��8 ��^p�.cT;����u���EF�jG�t����{w�zC�t�y](2���Q�(C�4�M�1�
�m:��.LZd��v��M����� �7��M���K�1�eh���u���"#F��m:��.tZd��v��M���)o���Q�(o��@�s��d����t�y]����Q�(H���*@F�jGq@:��.tq2bTq@:��.��d��v�����C������u�����Q�4�M������t�y]�d��v����-@F�jGq@:��\�M?Y��8�C_��C0��q����cOG�$v`y�L�����
���H�!d.�"�B��}W�a$�\��Ba��D��	2�w[�0�v�6d���C�V(�C�
� s��
��j�n�����C���
#A��u:Z0��!b�L����
�)��L��\���!#���8-b����0�t� �� �j����\��Xd�OsNG�X��w,����!#���8-b����0�t� �� �j����� aVy������!�e��9��Cl��;vsNGR	��6�8������^2���9��C���;^����.#�X�8-bSF���p:Z�bH�W�a��\.�8	2�sNG�X��w,����!#���8-b����0�t� �pzUF�����w����#OGs�8`q�hTsNG����wl����[F�q�p:Z�b��W�ad�\.�Oq�h������.#�X�8-bSF���p:Zb���j��msyv�j�UD��(&vU�����J��UM�6�M�z�j��L�w�]�f�j��Ml����W�ad�\���xM3<��]��]���[�
����!b��qH�a;Ll�"6N�O�6�����������0�m.\�8q4�OsNG�X��wd�������p�������!v��;sNGR���6������ ����i��h�k2�������!6e��9��)�l{UFP����z��mC��0�t�`�Uy������!�e��s�8�vq�mTvsNGR���6������ ����a��h��2���9��C���;VsNG��-#�8�8-H1d���0�r.�S�lb��9��C���;sNG����wdP����>���Qm8�8-H1d���0�r.�S�lb���s����Ob�W�l������8�T2���*b+�^�0{UF���-�!�^M��msy�Wk�jCf�\>Dl���>������l[0���jCf�\~����"�^���mc���C�"f�l��O[q5��Rm�l������CF����m��-#�8���)�0{UF�������"#�x��m�X��w,�d��!6d��9��`��2����l[�b�W�ad�X~�W����e@�-b]F����m����w��d��C���
#��rq�j�%#��l���� �F��e@�-b]F����m����w��d��C���
#��rq�8��!fx��m�X��w,�d��!6d��9��`��2����l[�b�W�ad�\.�|����;2��rq�,����6,�d��!6d��9��`��2����l[�b�W�ad�\.�Oq�l���d��!�e��9��`�My�n@�-1��O�a���<;�q5�*"fx����]��Z����6�7�M�z�j��L�w�]�f�j��Ml���W�ad�\���xM3<��]��]���[�
����!b��qH�a;Ll�"6N�O�6�����������0�m.\�8a6�Os�m�k2�����\p5q�lT6s�m��e��9�� �f�j����\PNq�l���d��!�d��9��`�My�n@�-H1����0�m.�C�0b��9��`�Uy���l[0����#�m.�.@��j�n@�-H1����0�m.�C�0b��9��`�Uy���l[0����c5 ��[F�q��mRa��6�l���� ����i@�-bMF����m����wd�����>��Qm8���)�0{UF�����q�fC���v��<��!�{���X6����1�VH"��E��m��
I������uw�d��-�E��+$�j�.b��qa�$�Q�8D����;F��1f���B��v�E�����;���<�Xd���B��v<E��6.Z����Q�6�h\�eh�^�jG������C����F�E�����v���8�-��U�(Cq4.Z����Q�6�h\�eh�^�jG������C����m���Peh�^�jG���q���6��U�(o�G��0�^���������C���"���K��zU;�"���*`z���G�B0�^������qa��^�jGq@�����!��
�G��%`z��
�8 �h\h�����Qq4.q��U�(�8nq��U�(�8z]��?Y��m$���dh8�jG�H�qa��2p^��2�� ��-C��yU;��F��.�����P�6d\(2��W�a����-�U�(C	2.LZ���Q�6����!������"b�"V�c1&��P��!�X��M�j3��D�7�v�"����.b�K������?Y�av@J�q��"vM�v�E��M��"Vo�6l������!b��j�S��ib�)b�)��������C���"A��"`����� �B0p^����H�q���eF��eh#�]p�AVo(C�/.Zf���Q�6B_\h2���W�a�����-3�U�(C����;�zC�}q���eF��eh#���*C��xU;��F��]����v������m3#^��������;�zCq@���p���jGq@���P���W��8 B_\��f���p�"���[��xU;�"�]p�AVo(���8���vD��M��xU;�"���!`F������-`F�������� �7x��;F��8 ��q?��;��B&�\^Dl���Y���	2�W[��B����0d.oY�0�j"f������PX�T2A��!b+�!��L��|��
��sJ�a����hA�!�]��� c�:-b�1C&�\~��
����jC&�\.��g���wl����[F�q�p:Z�bH�W�a��\.@�,2���9��C���;sNG����wl����[F�q�p:Z�bH�W�a��X~��0����e��h��2�������!6e��9��)�yUF�����w�/yG��\.��8	2�
/sNG�X��w�����)#���8-H1$���0�h.�C�b��9��C���;sNG����wl����[F�q�p:Z�b8��
#��rq�����EF�����\0�8q4�
�9��Cl��;6sNG��-#�8�8-H1����0�m.��8q4�OsNG�X��w�����)#���8-1��O�a���<;�q5�*"fx����]��Z����6�7�M�z�j��L�w�]�f�j��Ml����W�ad�\���xM3<��]��]���[�
����!b��qH�a;Ll�"6N�O�6�����������0�m.\�8q4�OsNG�X��wd�������p�������!v��;sNGR���6������ ����i��h�k2�������!6e��9��)�l{UFP����z��mC��0�t�`�Uy������!�e��s�8�vq�mTvsNGR���6������ ����a��h��2���9��C���;VsNG��-#�8�8-H1d���0�r.�S�lb��9��C���;sNG����wdP����>���Qm8�8-H1d���0�r.�S�lb�����@���?�;��d�QQT�U�E�{q�s+�"�m(8G�M�_�����������
�����3C�����lr��\��F����Tg��Qq���������8+V���������T��n���
�L��B=���<�#7�2q��x�G6�����,
��G�z�7��G6>�3����bS�l\�3�����P�l��3������x�g��+�"Br(8Gj��K=�(�\�#_�d��I��G6����Iq�G6��������
���G�z�����/������B=�/�"t*8_����I��G6��$��Iq�G6�������D
��G�zf���Tt>�3���bS�l\�3H����P�l��3�������x�g��+�"�v(8G����G.����G�zf�Bw*8�bx��8�#7��x��x�G6�����
��G�z�>�3���|�g�+'�����g��+'�����g�+S���RpNA}T�g�������Gq���b|y�VUpfdMkS��T��6W�]{W��U��wW�C�x��=�#��
�cy�5U����x��x��XoUp�nWl�*�C��
��p�q��8U�>U�y��x_�x�g��C�9���P�\�zq>�O�~����#3��
����3�����3�������x�g��+�""~(8G���r�g��S��t� �WN�M=�qq� �WN�S=�qw�`@9�C�9��B=S��*:��('����r�`K@9)v����=�}��8�#w�6�C�Pp�
��P��C=�M*:�l('����r�`�@9)v����=�
��x�G6��(�"6
��;Q����z�Tt>�3�?PN�M=�qq�`A9)����H�
�L�lP�y�g�����A��sl)D�zf��l#P�9�(�����+���������C7����7."�\X��`*���\�W���{�F��-��;���Em�T�q��������fL������,��|�)rq�u7����7�E.�0�\�]X�|c��|-�{���Ym�T��)rq��ua�s_���[����17P���q�9�swc���e������"c���o,c�n�.4c��|c�t��ua�s#��������1wP���q�M�s�e���/�t��u��3�G��2��V��B�7w��(�X�������)�`������v����E0�wW�����l�T���"�2|]��
�(�X\�n�.tq�~�o,�H�
_���Q?�7W��?p��(�Y\�n�.\�
��(w.��t��u��+���|cqE�}��0���Q���"�@|]����Q���"�B���{�F��2�����B�1f���e������!c����{���1�w�o,c����\��[�|g�t��u��3wG�s�1Nw_��1Cw�o,c���.Lc&�(�X�8�����"����8�|]�
e3����������ZE	�7n"W���&r�i��]�z7��Env-�x��TW�n.�Q�svE�#��pM����o|��u�\�E��Z���k���C�����"7N��O��O-����\���F����to�u��+���|cqE�;�����fL��+�����[���7�7�1N7�~.��.�Q���q����Pd��w�|c�tc�u��3�F�s�1N��^��1�n�o,c�n��\�}^�|g�t{�u��1f����e��
���*c�����[|�]���6�7�1N7�^���3�F�����n��\�_�|gqE����p�+j�|cqE����P�L�Q���"��{]��
��(w��t��u�W0�F����t�����"�;�+�-���"�`���������BW0�F����t��ua�+�_�|cqE����p�+^�|cqE���sw�����C�>0��|cqE�	�|2�8-/�3���K����(�����YT�9N��������8z)8����h��������s��gEWE����
�qZ�CqZ��*8�iyVLU�iy�����}zxZ^8�N/�tZ8-/�����qZ��*��<������Y���iy�P�l��3<-/�o����=�������{)8����P���<��G6>�3<-/��zd����iy��8�#7�O�'�[=��p����p(2
_
���<*.�O���zd��=����I��G6����N�S=�qw����p(2_
���<+�3<-��*:�iyV�g���a8��=����I��G6����N�S=�qw����p(2%_
���<+�3�P�0(������iy����#�O�'�����gxZ^8)�����{����C�����s:-�
����#����qZ���Y�3L���\�3<-/��zd����iy��x�G6���EF�K�9��g�z�>�3L���|�gxZ^8)v����=����Iq�G6�����b����s>-�
�L��_E����8-/�kU�8-������bo��\�+�����������+���S<i�RpN��Y!�����k���1]��U��U����|���Ul�*�C�����T�q��}���8]��T�V�0�_
���<+�3���a~E��=����I�UUp����P�\M=���Ppn���N��zd����iy�P���RpN��Y��)�z��Pt>�3<-/��zd����iy��8�#w�O��"�	��s:-�
�L=�3�)������iy��X�#_���N����s��g�z�v������3<-/��5X
���<+�3�P�p����{�����bU�l|�gxZ^8)v����=����I�V�l<�3<-/��DX
���<+�3�T�p���{�����bS�l\�3<-/�GU�8-�
�L�n(@�y�gxZ^8�����iyV�g�������.<�sl&�2�f��������*�*r_a1������PQU��
?>���*�(���b�?K�9�+���"�3�Y
�i_C������Rp��Z��B����g)8�}T���}��c�Qp��
���
��"������Pq�"�3�Y
�i_���WX<�#7�L�+dN��zd����}������s�W@�z&�����{&�2'�����gb_!sR����{&�2'�[=��p���B�P�}�G�9�+��K=�
��zd��=�
��bW�l\�3���9)N����=�
�C1�����
�L�+,f����L�+dN�U=�����}��I��G6����W���zd����}������s�W@�z&�_����L�+dN�M=�qq���B��8�#7�L�+dN��zd����}������s�W@�z&�3�Y��i_���WX��#�L�+dN�C=�qs���B��x�G6���W�����(8�}T�gb_aqQ�l|�gb_!sR�����{&�2'�����gb_!3�>��'��e_������*��|W��B����g)8�}T4U�M{S��\�wU�]gW��]qU������(8�}T�g����k���1]��U��U����|���Ul�*�C�����T�q��}���8]��T�V��>�����P���}��E=�����}��I���RpN�
�P���������gb_!sR��#�L�+d��Gx���*�3�����G6>�3���9)6����=�
���T�l��3���9c�Qp��
�P�����K=�����}��I��G6��3���9)ra)8�}T�gb_a�T�l��3���9c�Qp��
�P�����K=�����}��I��G6��3���9)v����=�
�����x�gb_!s(�>�����P���}��E=�����}��I��G6.���W��������*�3����V�l<�3���9c�Qp��
�P�������8���Ny�����T;��)5sRD�$�6.��
��I8)l\UqVS�� 	���M��� AN
wU�}\A�$�6�X����y����Sc_W� 	'��oU�}�u��Q���������+H������*����p�@ac����+�N8�
���Nx�XWp��P�Y'<��J�	��6�	O���t��o��u��.��p�@ac����+�N8�
���Nx�XWp��P���	O��Ru��o��u��.�t���}(l���.�L��
���z&��+���r(���I����G8)l��I��R�3�7�������+]=�}(l��I��2�3�7����������>����z&���������z&��JS�p�
�g�.���
���z&�������6V��]�u��Q��u�Sf�+M'�)?6�	O�=��p��P�X'<e��r��3����:�)�_Wp��P�Y'<e��Rt���C������W�N8S~(l��2{\�:�L����Nx���+���2^�Lx��q��pR���bd���tF8)l�T�6W�M{S���*����������P�i�����Cag�L��q���xMU��V��v�z�b�U����W�*�C6>Uq��x��x�������{wR���3)�����a����3)���3�Iac�L��q��	g&��u�S����.A����NxJ�q���3����:�)a����L
�]'<%��2u���Cac�����+�c�r(���v\�t���Cac�����J�	g&��u�S��+]'��<6�	O	;�L���L
�w����+���r(���I	;�\�f�P�X=�v\��f�P�X=�v\��f�Pp��������a&���3)a_Wp?!�P�Y=�v\)�f�P�X=�v\i�f�P�X=�v\�f�P�X=�v\��3�����z&%��
n/�
;�g���wRN
�g�=�����,����N./92{VU�EgQ�^\qVUD&��	�RpN�=+�(2�_M�#�gEWEd���U�92{VUD&�C�#�g�TEd��9U����������})8���,�pR,�*:Gf��S�����
����B=����zd����g�����x�g�Y�Pdf���.+�3L��E=�����g��bS�l\�3�,�pR����{��N��zd����g�C���RpN����3L��U=�����g��bW�l\�3�,�pR�����{��Ef�K�9��B=�����WUE��`�z�_�f�Pp��3�,�pR�����{��N�S=�qw�������~)8�]V�g���afE��=��'�����g�Y��8�#7�? �o����=���"��/���
����#�����YV�gfQ�p
��=��'�����g�Y��x�G6�~@8�����+�3����.�O�? ��zd����g���T�l��3�,�0c�Qp��
�����*��|W��*����kU��W`ES��T�7Up��{W��UqvUp���P�)��]�����X!��}�5U����x��x��XoUp�nWl�*�C��
��p�q��8U�>U�y��x_�x�g�����+�3����.�O�? �[U��W`�z�j�����ss����I�V�l<�3�,�p(r_a)8��
V�g������O�? ��zd����g���T�l��3�,�p(r_a)8��
V�g�������? ��zd��=����qR�U��{��N�S=�qw������}����v*X��i�z��
Pt>�3�,�pR����/�? ��zd����g�����x�g�Y�P���RpN;�P��S=�}(:��~@8)6����=��'�QU�9v*X���C=�}(8�? ��WX
�i����q�g��E���h���+�����c_�E���E�c_�U�����K�9�+���"��WSE��W`EWE�+�kW��W`�PE�+��P���5��������
��U�
�������*�� ���������S��n�*8��+�3�W����=�}��x�G6��+�"���s�W`�z��
����t�p_A8)6����=�}��8�#7�������x�g�� ��GX
�i_�z��
����r�p_A8)v����=�}��8�#w���C��K�9�+�B=�����WUE��W`�z�_��#@��r�p_A8)v����=�}��8�#w���C��K�9�+�B=3����|�g�� ��zd������pR����{��
�I�V�l<�3�WE�#,����
����#�������
��,��#@���g�� ��zd������pR��#���C��K�9�+�B=s���#@��t�p_A8)v����=�}��8�#w�������s�W`�x&��WQE����UU1��<\�*8_�kS��T�7Up��{W��UqvUp���P�)��}�����X!��}�5U����x��x��XoUp�nWl�*�C��
��p�q��8U�>U�y��x_��{;�Yv�����13�==��EN�K��
�0�������p�I� @�~f�J)��:$�0��u���P����(-4f���r��4f����"1��<f��@lD%�v�+�'h�t���"!�C3PW 6".4F
{�c�
�ND�#���Fu�����1u�H��6��+���������q�1R�c3PW v"Ba'�6�+�'h�
��#Dbn7y�@]���8�)�.��+��r;��4f����"!��<f��@�D�:�N�mTW�O��	
��#Dbn7y�@]���8�)�.��+G#�=�1ub#�Bc���<f��@�D�:�N�mTW�O��[3PG���n�����1�)�>��+��r;��4f����"!��<f��@�D�:�N�mTW�O���Z3PG���Nu�-	��
��k�G�{�Nu�DO��������Nu��@�����PG�	���
��@�PW�v(1�S]>1Rb�+D{)!�S]>1Qb�+D{�(!��|���@lD�gJ��TW�O,��
�
u������?�
�F�����Nu�DK�����RBn��|����=�)�����������������;!�Q]>Ac�
��i�v��������Ha�y�@]���8�)�����������������;!�Q]!~��1u�h4F
��c�
�F���Hay�@]���8�)�1��+;���r��4f���n���Nu�����1u�H��.��+G#�=�1ub#�Lc���<f��@�D�:�N�mTW�O���3PG���n�����1�)�>��+'#������q�1R�S3PW v"Ba'�6�+�'h���{d���s;��4f����"!��<f��@lD�h�v�c�
�F���HaOy�@]���u��������1��4f������1ub#�Hc���<f��@lD�i������
�TG����
�	3�������n���
�F�a���Nu�D��!P�(!���������r{s�<Q�Lb&�vBn��|��L�+D��)1��9'v%v%%�v����Pbh(qj(!�C�������r{js��Q�Bc�;!�Q]>Ac�ki�@!s��c�
�F�0PBn��|��Lh�@!r;�1ub#�Bc���<f��@�D�:�N�mTW�O���[3PG���n�����1�)�>��+g#�=�1ub'"�vBn��|�������:B$�v���������������q(!�S]>Acfi�@!r{�c�
�ND�#���Fu�������:B$�v���������������q�1R�C3PW 6".4F
{�c�
�ND�#���Fu�����1u�H��6��+���������q(!�S]>Acf�h�@!r{�c�
�ND�#���Fu�����1u�H��TW�WIw;��K��"6"����{JLu��$f����=Pb�+�O��b'BiBDu��$f����=Rb�+�'1�DlD(���|���b#Ba����
�I� 
{��TW�����N���u�����
�I� 
���TW�O&���Y��������B��>���6}�Q`w;J����
���
�|Oi�7U��@��!���6}�Q >��>d����o8��'}��,�S��
GU��	�]@�D(�����
��
Y��������H�B��)m������Lc&�|c��AU������Au��1mD��b#Ba��AU��d�13�|c��AU��d�13�|c��AU��d�13�|c��AU��	�]@�D(m3�
�t4f&��)�����'���D�=�McU�����D�=�McU�����D�=�McU�'pw�����r��I�o�L�3�M�p���O&���4;���
G9��d�o�L�3�M�p�������N���o8���'=}���)����(g���
_hv����r���L���fg���(g�=Iw;��)m����}|����egJ{�����Obv���P�����@�Yv��GJ��8���egJ{��9��tw���I���}|����egJ{��n���B�Yv��CC����SC�Yv��[J�������egJ���%������P�4fP�>>�i�tYv��i���}|�3�F���1�r���B��.���6}�Q�}w;J���(����
G9y��o8���'���('��#}�Q�=>���r��M�p�a�����N���o8���'}�QN^���2���@�p��l���{|2�7�����(����[���~+����. v"�6��a�O:3('/�4fP�=>h����`��A��d�1�r��=��A��d�1�r��Mce��'pw���i��{|���A9y��1�2��I�1�r��Mce���xw���i��{|���A9y��1�2�������P�4f�{�����P�4f��~����{����������5�������y���������<_���R��	�q �t3���3�����S �x����?�l����]�n�h��f}���{��l�ov��Z�G�x��������?���������<��?���o���x~������������04��5�0����������?=������w������~�����0���c��.O�}�*���Z�_�n+����M~��o�>��t������E��*����C��z{z���(�K}B����t�W8�.�C}�T��)��������~���B+����5�����f���EQ���J�|�WI�*��@��j?��� ��i���,�,�[[�����[���Y�Y��P:����?��7e�c���u#��&t����������9���~7���\��9I�Y����]���.�Y-�I�Y�A^�p{�C}}�{Y���,�����n��[��j���c
s���|��������@�G�i�����j���p�N�����]�Q�/$�ll�|���-�_��-�/v��sno?��������w��lw]�����c���Y��H�y�e]��	��<��ra�d�����l��O�l������hB��fX�k~Q*k�m�n�m�m�.�r��O���h��D���<� �J�	�Z.f�l{���:�����_��������Nnw?�qZ&�����w�k���������/�7)9��OC�x�x�C��Vz����Uy��-�7G����5�����ZW�����:�t�l_�����d}~�![�h��+��1h]�`�g��{���+2���L��J,w���u?��2��o����hB�����Y�P����(������[��|l2I��:��d�{�I��=��NG3xH>!�s������`����
\Yrl	�O��J�c�v��g&T��]9�D�+tsZ�CO����%�2<���������<?��3x,�}30��+7�o�n��[=Z�s!t��jI���c�\�IQ��!�+^��a��/z�I�{ZU�(�������QH�e�|(��t�������z�����5	���Z.{F��-��~qX�K>W�A�y���K*�n�r�����3���]`�\�N�Z�`c��S�$DZ^�s-[�^�wS����oB�;�N���l���N�Xs��]�p4�-C��,����o���^�������������~��FO��������,��C��1�u2i��k��qY�����	m���R�u�5����������B_E_,������h�F��a���4*7��������6g�Q[��'��$���|)����~/r�O���$��#N"�MG��]�wK�c������N8��/]
�]��_��c�������kw�������G�u�Mj��:�zV�3�fO�����E����A�����(�>����`��w��(r8�A�V$�s��d_�3���.+�;�*zkWQno��--bn���~�~K,�������y}*7�(�-���0�R@��u��a�
^�3pK0�jv*_Z�$�z�^�j��R-��?��:#j3�$�CSv��Pa�{+�y���W�
��_�Z$���"`�����M9�*�c�;�=Zf97��=�2���u���
��$��'%��/[W8��z�U��1{����a��s3qh�
�7G%�Z���H�cU8��Z����Z��7�5gK8�e]�i2`-�|���6gKjps�����dK�	���X{�D�k�@�����kl��i9��.a��!Jd{y�'{����-;���}	.�=����)2��A����a	.���
`�f���&,�"\����������V��0����%��5A�n��
kgmn�)�]�"��,�r���O?�6���}��k�c9g���5�f�s�|��s����v''9�79��>4��3��=��c9���c���������v�����F����/�6�:�]�������y�/��e�/�}�m<��c�����{�*��X��I��������d)=qB����H{�
�_���h�U�pf�o����Ha�/��J�n3��Z}�������y������[,���hB_mM@NYW��DJ�u�n@���g����7��W}]�����x
t��tHR�����V^�����? ������=�e���d����� �p�aZ�7K�f>���e�B\�;8.�E��\��G�LK���!���K�������[���AQ�ko�v�5�g��Y�V��d��V�Y���L�D�'�%����e����z[�b�F����%���+�����:<9vl��~g�O��}0��a#�\��OS#�R������W<9��~��2������0�H��em57K6��p/�~k`�'-�i4�6;bC� ����W�q;gl�Y���s�C	�����N�q��
��������x�M�<|��a>���_��J����>7���l�t��#,����>Q �6���Hd��6K6�34q_�6l�;Vs^�����ck�xa����F�l�L-"v;I�Z��������Z}I��0�`g��}����������Dn.�n�����M��#{��-ac���C~�C�O�l��VK^��������]���;��^�K���nw���,�tdUs~�59�5�e�k�=Gv>����T����P��&d�-��{�����d���w���D�7h�d�����l��>�P�A�[�Y�*��m������_�s������^�K~������ps��~��Y�y��E�{[���n�5�]X�y�[�BB�-�.���(|l�:�y��a��'�I���������P�A��|���R�c���v;�B�����+?�����~�ZJ�d���S�d��7����x2���89"���������_m�{�������Lwrk�l�-�(r���KMr�~ s&���#�o���o��^e��-�=.�\�s,��������!1���P}{�����`�*[�Q�fc��j{�����sn�l?E"�W��g�#�����fs�-�l���%��9{W�0���%!�\�k{_���<��u���Gh3�X��=�t?��wu�+�u�����u��"��(<�R��]��p�wRu��A~��*��n0����O
�����0���}{�v)�/�C~��\{(��n�����c�?�����c���3�q8��w��P@��Ic����!�?
�����������N��<�V����-��]�;�A������� Rh��/>�Lx���P"��,�fX���w
n���}�i�g"����33��<-�e�}s7��&�$������r����l3d���p4���k�Q6S�b%�5�R�'���p���������%�����i4�)�G���������o_:��#[N��fG�q�?b{���' �-yB�
�8w'�y�Z��-��������/��V����(����Z��l���&�7��C��Y[�y�u��`Nq����v����R����kSvx�-�2�M�i�-���.���m)B��y��M��7��8�7<��3sQ��VK����������Y�[Mx���F$��u?�N{�3o���9��x�{I����^�w����lG�����^�R��S,����n����������%�:s�W��R)O�Y��}��Dv���Z���\�X�A�(g"���(g�=j�3�G���r&������M�rf���3���#9��52��MIb�I��
9ii��j��/@�����O+����3��i�RA�]��@A�O3T@�����8��m2\{�U ��GT��i�-�3jk#7�Mk>7_Qs�v+���We�W@�/��.<�c�"Gv�z�q�zq'��"������Hh��@����[]�*�f�t�0��s�/g��_�C��&��qs�����4^
N����H�_"���!��Ht''�������akG��H�l�=1w�W�t���)c;@�H�.G.������i������hs�'����p�)����q_�������5W�[�%\t�5����E����$\ ;H�@�o��5�`��2���F���Y�^T�������'G����������������x����,Gj��R����ld>1\~]�7� �f�&9I�-��;9���:��`��wo/��7������MF�U8�v�6��$���v��ds���d�C��<R������������i�xy�E��4���H��Q�����;���d�u��(r0��	a��5k���n'����%��������9��aA9o�!�-����]g.���m�9o6$L��I���H{�a���=n{pq�h�lO���|B�s��y��@��/�|v��zF�RG�~�{��=�5�����[�_1��	�0��'4�K���3z�K������j8t�6�vmorm��ah�iFl�O.���/B��#��{��dOL?�R����n���
*wCI+5u����o����}�]�`Xzt�r_�x�1Z}��-:F{N�r0���F]�T������+cVnv�.��MN9or�nw��(r0��	�^�&4�����;�Y��~�s���>�r�y���[�3��VX�&�b�N���O���.�m%�I���I���I�������<��-�}���GF�f����YyKOW������-��[���>T�b�����I�p�R����U�U�X����pK�i��a�]7�#�����_���8�7=��c�q���,y���Z���^�T����zVam�@V�����4��*������huOg<��%;�U(D�����O+��Eh��?L�8w'��#{��yM�b/�~p��f�7�!QX���!%�W��S��/���8E�2-8E{��r4�����07'��=g
K������!�H5������1'���rZMkG|B�g�-���f���x5-�o�%/
n��~�q	?��o9���'i�������w��W��<{�xV�{���Z��zG6d@*>Q~��vH]�d��&��sZ`3d����G��w��m��Y;"���Thlg�e����VM��9���i�����gk��wm&t����������Eh%.��C��J���w��3>z5t��u9�GX���1������s����c���v8���q�;��I��j�����Z���dN����b1���\����������V7�y��0�M>��X�O�h����6)���D�/���i�1�����d�U2��x;p�>���S�����l�������j����kO���O��9��f�M����s�h9����,�M�����R.���=Y)����"�|�@vQp���"s> ;(8C�Q���)�vU�������������{bg����!se����W�����[����L������ntm�,�m~���b�e�-��o��0d�Vx�'��&���;��9!�3�q����Y)`_�5����n�����?X/�,vs�����������9Uh�����r���J��U��~e~�}m��'��� ����;�s�l80-��;,����c|�fO�G���k�n!k/���t��e���������k��|���2�����	���8^���nn��wzm�m_���A��� ����Q�`����q��{19����`=1V������I��
��g��3����9-,�"�~�w�-�^��hB��E{�T}Fm�n�/jxnv��6����)����<?��f�C�����~�����]�F�f�1l������7�+w��'��]�c�f�b�@a�Oht����w������^�w.'����xF��6T��p{��pd�v'�d}���z��\^i��R���!h�!�~5duQ����j�=YN8D��D�>��o��+�Qk�f��4��<�G�w����m��^���,��a�	?bv��#Vo�I���P���3�������Y�u���P�r�����*u�{ /�E��h;�\��/�\��E�$tj���W����Q�ejp:�`����d�����7C���t1l>T��X���l72/�9��U�P��7C3Z�Khm�	-�%�hP����_���m��=�g�f����g�r�^��o�����o���������d�m�lO���������
��H�{�>@�r3�OCVn�������N�-�Z�K�R,Z��c�O���{h�����`��u}&?
�P���%A"���c� �u����\}S|S�������}��4w���>�
C��TPd3�7C�7h�5h�����0�`���H}-Y�L�v'H}�L��<;{�3C�9��#{���>�$����1���;!��8fhj�l�-�����PJry����U'�hf�=>?��p����r�Qk����x���Qq�=�J�����t�O����NN��MN��NUg�.���	U_������@��(X���!�-��[�h�����Q{j�rf�K�3��Nc��[DkK���d��Ch��$��d(8:�"���;��u#>���-}�g�p��=�j�������yK7���kS�h����0h�����.���5��S��[����>�7���#����?�7	6��l�k�3["�����v�����O9Ar���=�3NR�0�����MZX�Y��p��k~3��k�-����\�n�6}�G�n�:/Gz���k�ft;M���O��M���I�Hd���1C
��<?��fP���v`.a�&A8���l���C�[������{(	��FI�!Z���L��Y�>��k|����������v)";����I��zY	�DXq8J��I���i����q��0��Cf�f���M6w��#~3ts�%6�������S/�E�hoc�l�)7%����B��������u�U��L"D���>���kD������<��h�n�
��c�%����-�e�yq-��z����!hq-��i��x�Y�L��"K�m�0��J4O�XQ��A��� ���(��C��~���C����9���l�"��+���~�t�,'F�[�c�����!�nFy���u�y��d�"-z����#�E���I����x��(r0��	��f2�L3��W���x�ai5L?�ygX6T`�]��@�KP�Cv��d��T�
yp"��O���@����X����.b�MNb�M�M �����������y�,��b�+��Z@k���+j�m�PfT��9��{�Ct��f��iT�Pn��h��2��E�X���#����#{����M�����`��=s-5�~Fk�A�ZD+��m��d�8~91j�)���2��q;�Ey���4z�"�/yig� �f#���&�����sJy���Gy#�g�X���k�1���c�-���Gw���Z�K6�z~W�y�G|N�@�?d��z��s~�C�b�zu�<��#e�N���>�o���H{h�v��PD�.���Bx�G�Q"�O�i���K B}]V�c��#��9���=>K;���3�����:\y��|���aq��\t��X��>��[��,�	��/r��E}xH�l�~y7����|��kn��_���	G��}�>�$��d�}���	��&m��v_u�a�	i�Z��+���M�Q�W��;�yW���)��������i<�o��~t�+�%�]������kl�W��4��v�����+�9����������}�B���E�����u��M_�'����:b�M>�<w'�F���:�a_�E�F�,�;zye�����i���Y7�8�.��i�����@����E���,�;9���7�����fP���dzaoFS��!�XP�,��a��~F3��
��$M���{��Q"�\{L8���p����mm�~������'ws�>��0����p�w�s��^�&9�s�s�����s8�A����2�Q��������s�8����A{���I��p~�{������n4�'4��|�;���5G3��EK����n������ Q��-��t�{��%_�S%����s��2�����K�;���<X�}��Q���R%2�|�����3�hO�x{�J��I��O��;�����)�X�f_3��������?�
7�
endstream
endobj
16
0
obj
35048
endobj
17
0
obj
[
]
endobj
10
0
obj
<<
/CA
0.14901961
/ca
0.14901961
>>
endobj
7
0
obj
<<
/Font
<<
/Font1
11
0
R
/Font2
12
0
R
>>
/Pattern
<<
>>
/XObject
<<
>>
/ExtGState
<<
/Alpha0
10
0
R
>>
/ProcSet
[
/PDF
/Text
/ImageB
/ImageC
/ImageI
]
>>
endobj
15
0
obj
<<
/Font
<<
/Font1
11
0
R
/Font2
12
0
R
>>
/Pattern
<<
>>
/XObject
<<
>>
/ExtGState
<<
/Alpha0
10
0
R
>>
/ProcSet
[
/PDF
/Text
/ImageB
/ImageC
/ImageI
]
>>
endobj
11
0
obj
<<
/Type
/Font
/Subtype
/Type0
/BaseFont
/MUFUZY+Arial-BoldMT
/Encoding
/Identity-H
/DescendantFonts
[
18
0
R
]
/ToUnicode
19
0
R
>>
endobj
12
0
obj
<<
/Type
/Font
/Subtype
/Type0
/BaseFont
/MUFUZY+ArialMT
/Encoding
/Identity-H
/DescendantFonts
[
22
0
R
]
/ToUnicode
23
0
R
>>
endobj
19
0
obj
<<
/Filter
/FlateDecode
/Length
26
0
R
>>
stream
x�e��j�0��z
�C�+i�JJ��.����8��������L�����>����I��SmM���T�w�j�p�
xcY�smT8�U/K�s3���v+K�|D���W�zh��%o^�7��W_�&rst�z�������.z��U��t[�:�M����O�99�9rF��A���/�X��S��9����W�����0
��li���M/��&����g���~V�eE^1R#i��@�fD[�
�=� �D�S2����V��H������)<,�����wGy��-���DP/�&��"���y).�TG��qsp|�����r���^��o���
endstream
endobj
21
0
obj
<<
/Filter
/FlateDecode
/Length
27
0
R
>>
stream
x���	x�E�6|�������t:����
$�@�D��{"F�+��2�����.���J� ��1.�:�8���82���2��H�����c������}�w�����S��S�N���#"-'��g,^�~�}o#�N"�����y���]EK����s.����7BD��D��5}����<���<� ������a��������;�L'�
�<��3����_�u!�s��~�����b��P>0��Y��u��
������,�M��PQ�`����"OP�)��!�i���������t��e�~4�T�#�J�t��D���)�<4�F1eBt5�#�8�	�H������e���_K�����?TF�4�'�,�D������V���x����&~�Fn���w�o�j*]��*����OE�S/�Z]���<F��n�G�����C�y(�f�]*���F�B�MIA:���[Y��B7��av����DK�h2�GKh5m�����i����K��N)T�>��O�6�?���C�o�T�I/`���M��n��F��wE��4z�Y���V�]����=�G�����D���t9=E/���K�,��F���G����g��|��:��h��E�����]����l�F���2��l��}��|&U�C����2�!�;��!�����?�+�*�P	�cg���-�.v�������^�~��k����c�_��|t
]L� �������i}I_�Q�d��v��sn�9|,��o��G�1�u�S�u�z�����v���4�9�1rC���k����Aw�P��D�x����Q�[��'��f���������ad��O1J�?9|0�A��� ���
�F��*~����;�3���)9�@e�r�VZ�����S-P�������j3S���M���hOk��J}�>O��t��
���{��#B�9�p��k�&]I�M�B��c^�D���#��B����`�l4��Nc��el�����`��G1��������t>�_�W�k�v���/�7�~~=OWr���O���LU��*K�+ ���������G���!�Z��C]�^���nR���i�h���^�I�M{M;�������b�,�A�=�nh�3�2�a��<�e�^�y�:}x�`������!$d3��1��aV�WT�D0/I"}K�j���
5��l7
`�e:W`��������ikb�&�<�%�-�F����
����O�w*�>`����Fv6[@[�!v����e��(�T�����F�������t:���U�������C��S+��}��e�1�E?�uS`����\
}����k�:[���r��*mg:�x�>D���w���5����\�n��hy�VV=�u7�N���Z��"vV������N��t)��u�p�����������=�z�clVD+8*���������~}������6��yY>+�z8�-��i�����W�~��t4�=h�#8�^�O�f��dPo*C���toP��a�G��f�`���G��\�������a������q��������r���1��������E�a�Il_���t3�V��w����~��]�a�Q�74�f���T��avP,k��'�;�9i(�a���	+4���B{�q���*O`��"}v�L:��G/�1�vJcci@d<��:S�0����m|Vt��$r�LaNu������hT
9�r�	����/�WR��O�P��E��y�9���GvV�/���IKMq���I��j1�tMU8���sG4�Ma� w��>"�;	�;%4�H��2�@�,�qI%gw)i�J%�3PI�}z������Z������&�!>$��ex�;���S�������sVo�Au�l�a��fY���mV�6��������!Lx���q2;���/�fx8#�F� ���>3\7�~xMf0���w�
;#wF�r���C�
����aa�l&0W�����n[}u��f4��3sgN?�>�Lom�Bh�&�~�A�QT�V��sn��z�wn@DW�^������

��<D��h�j�vB��+���J4#���oV�p��tV l��;g�YM���0��(���;��7<�zb}n0\���0�&k[*�QK���qN������`�%%�vG����<��E�v|�d��Q�((D8pF=������Y�h��P���L����eX�j�	"]���|gn`��
�=���S��S�|��$�BO:T
��p8
��%T�4s�>��}z/n�s�9 �A��N(���A1�kZ
��Hx���X<@32��(5�y��iK��M9�9�M����$�������_���2|�	a����Y���	���N�_��m���b��:���p��z%��C<S��P��:
�H�=����.�zfX�R�v6��}7X��_�i5�;1�F.I~`��2|B����?���w��
�������^m�Q����G�F�nZ=�5�|Fn���z'��7��7�)1���]k2�#�n� ������n�e+�m3��	���t�$�rb}3g|X���my���	W����#U�"F�����2+s�A�\��2A��he$���4Fg��X�S���n��|
?�
L4t;g��������BV�a�a��W������7�<Z�^9�y�rt{%U!�<��~%AW���/�x@i;nh�=�6�E���IZ���`�^h]i��6�6[6%=ny�b��j�4�&��t������7W�
}�e�c���pl���_���<�x��M���������m3���2�F���\���
��7���,N	��������O{C���Cc�G��>DU�B����Fjld���������Y>0=G7�.����t`�@�������p��%����w�:Ov���1����=2-���f�k��;n��z��|���y|�T5��^��2��d�r6����
W[Y��i�Z����-#�'�z���F����WjXU��.�	>�Hc�h1����#��*��Pc0��M���W
"���y��Ts�
���xD�o��d�*c�c����/��{�z�f����e��=9��&�fuc�Y��H/��T�y�e����yy>�S�NVWz���3����Y���n�vf��e/��"{]����l1/����l��'g;1�<@%p{BV��Z�=-���[�d#�o/�s��;�����������O��\�3z$&�������NLb{h�A�`�q~��]�\�C�0;);����}hN��HrV�fg�fv��*B���M��&�6KfF&�La��GE��� 4�v\���M,��4hP��}q����
r�,����������I�/tn���f5��1G>�`��������#GO�0-������mUS&�>��K�>~��G�h�Q}��@��h��L�ROz�(]��b�$kM��<�mJ����v���}'��Ie�x�I�A��T���	�N������f8�:���<�������"���1Sc�cN���)E������a���e�������'����i��<��y��/a�)�W�b:_N�@WEqc|B����?��0+�"R�P!>jL��kKJ�T�����T��&�(7'o4s:.7e����/��~����H{���\���g������y>r�]�s����]���)'��l�}���W�����{�|*r��'W��\���w�
{�Qa�g�C�	������C��5_-r�tLulr<�x�aa�Lv�a��6���v���=j�5UQ���U�����L��������i��� U�����Z+���p;M�MO��/��/���H��Na#��>8�y�q4��X�U0����B��
)Cw�
�oH���lrrrBu��O����b��QLi!������?-��\�/m����;"�#[Y�Q����y��`_Gl������F����������Ge���;���I�+�t���t�1�Rp�\�|��Ir�&%9��Tq�b0�������z=.�+�ao�������B�S���R=e���T#UIme[�T���,v�bg�s�Sq��N�VJrr����:�MgF:K���ZY�p;��={�������V�d'���P��PR��)�B��8�mD�:�U����0r5
�_J�P�yS���xt��M����:�Q��>}�n��]y9J���i��]�b��A�}����#��z�����.N��{A�)��0�e[d^a�%c����P����I;\O���^Mq�i]5��=���8���v��`W���vf`��{@D���`q�����,����D-����m��������C�I��������vf��{��3����GbF�H#�1�BR|-C��V �.���K(�����jr��u���K�s����D�c��k����Y:n��y��S'��X���w�#o�3r�5�U�6s��OV�t���W.��N���BKmt�NR��~��2��g;Ad�M�m������-�5hc6�D~[����lcm�M����o �=�8�L5��f,��b���d$��
S|8=��*�N���.�Y�!�:,1�P�+��s�t���(O-9�{����s�=L��������Y	��;�����j:��KyT������e��o�sx��|�_����E���W8��eH��#��y�fxEiK
W����{��!�C��=6n��P�N��=���\�rU��B���-�TV�[
YD8_�+�^�wl_�w�[Y���x��e-���XV��.ZdA��"����k
��Q�36������5�~������,�W��_Y���������t���CN��5������w��
���e�����jnN�HJ��BJ�0y����Gg5]�,�����k��ic������[�0���
g�R����t��o�?{}�~{.y92�w��<��M<}������O\x������Z��VcvR��q�Q?�U��e��������yc����h��XR�k{��%W[��,e���1�5>�)E���xnK��n�6'���?����-L�`a
!;��,����w-���sClqv��3c����?������:�wZ�G:<�_X�	�-X9�P��)hnN��L�rmdN���S.v��������wY���_<�?���c���<�M�1�n��K���w�+�)�(r^������7_r��W��B7�<1C�E)��z�~�jI/6'��^�GU�L��E����?ml/I�����Rss��)�@j>#�Y8O_�s���p+�W��[����`I_�o]�����.�������@�(tjN�)%pmZ���>�&$��v8�����b��<�Hu�.bZtysvE�p}�,��"���$���mBm8�m�|?��K{����p�� ����B;rW0)���������?��u��,>9�A����~��)SjO���f��:���%mW�i�M;��P��e3���0s���#�e�I5�,Z����{�������uDP�]d�^�fS
�6x"L�K�	e��	��,��-qj������-f�������5�g�N����{�K��L>G�e9���_��o�l}���z�r��Y�����>gy��W�_}����#�����S�c��B���j�r���u�To���R����.���^��Xj�S�S,�V��Z�T�OP�,��UI&�<��5���t�)�����X-��d*����839n�3;�l�K�2��(3I�e6���;
����X~�d�VU�w����~#+>�|��H��9���V��b)��r�f��*A�j����j�XLf���h���� ���S�4�N�'L,�JM�i����X�Yx���y+d�q�6P��R�PiT�����#X�!g�9+}�����+}^�&!$�
~
~�*�[������}����V�L�
�����H�(?�/�����8�3q g����nfe&�'r(�N���?�]�����F��}�TpT�c��:����7��|�,-��;9sd����9�uYf���R0;����
��������}/d�u����3<�z����%�*�QLN�?Y���g���s�v���eyFN�2����;���F��YIRr���L� ���e����Y2��L~tmR��rU�L'����`+_��j�;����@����%z��a��z�+0��9���v�G1FN���e��	k�����=������ll���������s������5�q6
�b�s���'����4���.����l6��P,����gf�M����C
��B0-J����2f�����b���a�K�����b�	�<��	S#��<v�Y��_^��Z�d�G>�9M����b�������k'��~�W^_s���N���76�z<�z���d�g�G�h^y�5�Y���q�G��.�V�\ZP�7R�m��6d�?�q��
��������K�e��o�?�|��W���O�O���|����2(eP����=#�
��v�	�rO�WY�-I^�]��*�A�&�N��%I�Rf����S���;DJF�2I��9v1���C��F���r��j�
GV ��D*R�C��X��LS05�W_���8�P�����#�1k�GB!Pqa2�����}�\��V��,���s/]vv��4�:��'��������K'L�n�wN=��O��=��o�����]*e��v�.�[�m��7;�6Yw[v;Z}fs*�O�GX��x��C��{����M�~���o����4Z�f$�����L{5MI���Q%iR:(���''�����x�����##���w�+��@��9=c4�'F�Y��Xb�����vc���6�W��<����8-&���z��c}�Gr�l8�����2nB�^�V#�k�Vy��������+�[�����F'P�-:�B����9QKO�m�@���,��	�X�Cd�:X�����j��'�R�h4I4�d@X$+-�������_x��qRfb��7�����R��	����1��O�F>�r.K}�s���r����*N>����������u�03E��<q�������
�@����8�IZ�b6���${E6w���]�X�O
�\�������b�a|�25�!�!r���������Mqv��.����Q��-��}���n�v�fw�z��I���>{�G��	
xLj���$�R���E�b
�����b�`I���p���o�'
�d�#��f�f��z�
|�V���dd�|k��~pX[
+���3J�+;y�����9g�Aqj:���#����M$�9�x��^&����Yx]��]a2;a��b������sS����svhn���@���'=a��� &0}@����s0����jsv�����SK���tcO�q93
��{m����_�t��+���|D���AO����x����������I{v�U���$~�Cw�s����V�4:
^����IJ����J��4�O�	NK�^h��o�>i}����m�u���p��k�O1/���w�C�q�k]cc��_�^��������n��l��n=���^z/S/s��V����,Vxq�����MSuq�o��MV�j����\���+�8��rp[[N�O�T[uI|S�qN^���3��$����a�����21WJ��fK��B��%��������gr�bl;5r�2�Z����^e�#�k?���*�0��n�����;��d�tY��N����������4���-�6 A��lJ��g0��2�������b���/70x��F��c�k�C�
�)%=�T�����2s���\���S��fg�LI�n�14�
<d�����[g�X���W������y{�^���
��{6;viB����c=f���/,���0���	�?���z��`�U&�=
#����Bm�|�7�b��P�#��y����~���Y���h����;��s�%����Ns;F��>}�1�c�����o�����wgh����2��������j���j������������������2�����9����J�����������RUSj����9g�� �hc�T>F��I�S�\�l�l�\���"e�)e��u���-���"��n��_�,����G�Y�7�w�M7�w��<�j���<���"�\}���p�����������:)�n7�J���eJ��"�,�F��R��]�e�>�������m�:j�.��H��Qi�T1*���LNJ�}�z���II�N'��6��������8qz��D:&�3h���/a�f���s1����Y�2�`&v����w�������G���_;S��v����:��������D�#G#o�������e�]-��]�M��g`�.����rw%/s��Vf��k5�'g���Y�9-��Ak�NqLNiHo�M��h��������T���2�T[�B������AsFwO�T.W���k�������N�?�e���q���Zg��M��1;`.]���%�q��eT���[��)mg�Q����+g�^u����G����&�\������w�l�o����w<*N�+p,����Qt��,Il�6[[�)����9I�����l���Z{����c�����=M&�p���"�8-%�y��[�^������[�{���I8@
�|9��8�pU�dY�[���������[%W2�wEiL�Sm8}Bmx��+(� ���t!�$���mkz��5M
SN:q��b����k|��zs�_c	V�c���6�t��k.Lw�����5����zYL�#R�{�cg���r�u��{:&9f9n�������n��5�j
���Y���"�����,�����c�G���r�
���r��3�V�e	z�����\S^��{�������\�ke��nO���������lm����z=�+�[Oz�`n������=�]��7�����|��]���-����u�YIoV����,�r���.]YK��(���|��V!�����+nC�=�8���=��1�yXA�����D��>��M?��n:W}�^����"�4��#�lu>��b�j����H�q~�N���,��`k�����y"~����g�d��D�l�3"�V��9�����s�U�Q�����kI�S��������	jd�����%uLmb��a����5� !J�xP�1c4�l�zX�bU{���7�@������tq�N7�{��&���!VM��3�%�X�/������U�9�#�������0�F����4����E_4,6wUr� ��w8*���
l�W@B�n�UH7�����Q<!�y6�0~����\��Tx��I�8��0���3�-�OMyx���?x����7�i����
�S
�G�x����O*�*���j�L�m��k��2������5������K����	j<�;��)�3]7k�E��+y�����>��da]��C���T�EOI-HK#a�<���(t�W����0��ff���y���.�cp�<����*OEPs�s��|
����yA/��~���7��7D�f
�� k��ma-D� ����1���YI�V�f�$���4�)�������!%���*�
}�2R�U�U��pcI����dSUM����=�|�GK�dX���\*R�>�"k���kC,#�$~�6�4���.T�hZ.�.�����
m�e�u��-zK�����e�g��>Uj-�Z�������������G�}bKx�������2�:�!`3D��cE�T�[�*}��M��������M����-��z�Q��=.�N��vE��L�n6�4MU9g���#kqRN���>s����Qt~.����=�2��d����e�n�y��}��1�g�|��q3�����SKE�1�������H���6�Q�a~����(����v�rG��4�D�@b�u(v� ���,�(�!f��g��^fi�-�/�����������O�n$��9�������4�",������w������A���N(�QXu��t'�]M�*���������n[��8��i%iMi��4&/#r
�5�Y=�� Y�B1�e�6��.c�qs�
2����A�i�>�!v(	�n��7�y���C���v�)�PcqL�,�6���)I������a'����6�N�=W��6����6���I8��8S2���Vi"�6#��!&nS�O�P^%�|�-����?l���qc2��qzD���<�w6�81��w���2s���	�;����U�5�9]eV�/3[�7�[+/��d=�^���\6�/s��U���P�Uk�z�i7_Hf��������~�4\�N'Y����-
����,��b��L�o����(��&;��%�F��d=������n��?-�����:L�tc�f�~aw�cwv]xIbw.Qc����m)G*�.����>(��
��'N�,��H�x�X2YK@�&��Kqq[0$�{<���
������"��6�\7i�i�4�0-�];��_]d�r^�����X�������?y>���7$�q�R/���<gZr����|[���+��z�n��c���;��'m*���4$2��9���c;)���q4��$^�fjU^�y�r#L���=O����4��
S�)K}�&��"����+��(?���
����8&��y��:��z$}��2�B'�-n�7j��&���W�����F�������t��"�A��T�O����������2!�nG��_/�Z��������G!�$����ze�}�#�*��J�!��H���B>��|~��A��~Y@�@�J��2��D
��A����&9n��cL�����c��_g�Ob\��{;��+��P��_����_���KD�K� ��4o?�������L���g���nE���X@L���W�`�����t����R>��|z>�C~��
0u>#�a��C�sP��������_L�I�����z�=.V�z02��@���c��a�#���S+�6�{l^i���.&���C��?2}xxJ�!�gq��6��7G�M|�+�uB��&�B�A�V��J}����!tC{^������ZX_3��*���iq������b�$��%t&A�N�-��1N�S���Z�}��B�T�;�{��J�����>9�[��%����5������Nc�_#���C�:t1A�������s�~��3��Mc�?��"��1��H�x�}�a!kn���������U��������[ �}t���>���c��%��F�Em_*�?�]��by�
t��O��;�oj[h6��j��Q��z�&L�� ��Ho���!v��l�j�DN��p�j�	�A�jU�id@N�H���$��:��<����|]eJ�\��F�������AGw���\W]J���v�Bg���@3��v�����'��q$p���}��l4pML_��:��E����~v��^]���U/�R���.��S������}6N�Ha����(��v��	�����B���uO�(F{�vd7��#X���Gw�����������MgG_��G���{v��m1[&�Sb/r�bb�
hv���.��}��N�v���������Ob_�P���D}���9�{�}OV����	4F�Du1�H�M�6e���~E���?z�t-��<�Wd���<���J�@���Ut�����\�q�����c�����.A�����cs����wH}�-�cb|����)�(�.
��q��E���MR�!Q����7H�����R��n7�>}M>l�lkM1R��������tl��R��I�8U�a
}��%���F�gt��J)�]#����	��&J�3��t�������w�����}���M��N��e��m#}��O�>eD���bj#��@�(#� �?��|���@+aK����}�8�Nc�_2�X
\�Ls�(N}7��"����Neb-�a��A��&����?|J��bZ���������"���"��*���g�f�rY��}�c�S��F3�f��D�7A��������NG=q����P��������l���G@��J%_'��& �|o�>��Q�� ��p����v�3�������|����r�;���q��l�o�o�����.�uD��W���@��K�n�>����;�=���J��h�8�?�v�"��	������\zg�Fwu�k�T!�{����q�,7����Dw	(��*�'QO��z���>|]�Z������I��>��)�$G�����7��N4 (�W�?�O����wp����+u�c����3l��D�����8��<�&Wb���R�w�?��@!���W
$�]�����wng$� S)�?P��~`�B�z�4���/a"�AN#����{?��N*����R�@G|"�e �\�z����x�����t�k����&x%�����w����,��A�����v^�]�m���-��2]�F�/����v^���?���tp|���o�����s��t
Q�J���":>
v{��G�6	��^���n����!�
�`��I������x�7������c�v���~p��e�?z#��(_(|�����!^@��������O��8������X,���9����_8��4v@���D��!�m�����]����-�8K������%�Ng�_=�$(���8�_�����)M���/+}n�?�����I���%���_��,�W�;�9�U�g�"���_�}��m��i��v����_�������~
�r�@�����/c�J��}���~
�
����,�=-a[bc������{�cO��.H����k~q9]�����w����_��;����xb�O�2�JL�m��/���V�����4�����&�+�2��/I���'�?���?��zK����O�u:T=?�Vb�&��uw��x\_F5��eR�HO����+a������f>N��G�������6'����W���������J�M�+�l}
���]�V���?��d��a.�P10p��s;��X�eyA��+���������_����w�8�!��x2lq���z�{
y�n���}���i�w|�v�E�9M�-�Mc`�g�L|��~�a�������(�n�����Lq7d�B<��>���l�b��&���C>�����D�3�����=X���}�������r�I�t�|���^@E�9�T���W����d����M�����F��t���L��LYt�z�`=�n5�;�,�N�W�}����������t�)���'���Z��s�	>�0�f|��P���W}�S����}g�������f������n�������d�gkv�c�}�2���D[�K�/�B	�$~G%��]W���&J���N�|�wf�
���i<�}��|&y��g���E?��s�L�le�p�*�[�*�$������c)�t�|f����o��c��CV���D?T��-��/�lw���~f�u3S��j���������!�y���q,����=I��>��W�X9Fq7�z�(w�������*�hZ}�
���� �~+p������1�����V�Q��6�WFG�R�9l�g���p���"�F�G<��J��ec�rz	��,N����2��8��cc'�\�������9D����X�'t��3#���)���1�+�+hqW ]����������
���~�R�_��/�t����K��v�s��]������$���@z���cLW }L�~�>m���1�b��T���A�E����x��x�9?@|��8jq��~���D[�������=�/AO��%x#�cmK���l���q�?t������{��d��GdW�����2;owS���t����(6F���D������bl~��{g��=1�)�����v�qN�f}��+����g�Z�^m�D�
[+m�\J��W-�?��E�;�Q+)[��:��o6\�'a��y�����L�<�#o �O��z��y�6�s��}�����P��q��h�{u���'R�~�t�|�?��g�Z?]�_"�<�i�g����L��|7]m��|�F`��HO�m�+�Q�7��|�L/#�
��|�+�3
 q�S�h�������+�]��51��>���.���{R&/�L�Po���=�3r��XW���d���VZ8�f�&�����T��}�>������]Ni����6�z.�0N��H�>@��nW_@]wQ@>���t�D�y�gt�xW��_���:���AG��*��N������S�FS��1�O�]h�O�9�x��zN���:�t1��Gh��,
���O?���}0=@�?3��n
M����va�G�a�^]���{r|��x8n;�����DZ�������sc�"/�,n�"V���]�|��U�w4q���m���V�;��f����&����[���whb
�w�~�W*�N��>�����N��]i�y��a�s�����8}Y������������,�����Y�������������j��d~���ww�y=�T�������\���]����^?�q?6)v�H��ws~�p}�R��3���;=?��� 2=����T�c�+���o-��K��3�T�y]����?����Ma�/�oI�.��z]g���:y>������w�����$�+nz����4�
���	��y��Wb/�U���[0_�v��1�sL��rO�1!��#9_�>'����?�G�y�����%�������������wDk���s��c������Z��1F�4%�.����q|�~�M��8�������-: �v�@;]���8����7�v���~���w�:��_���c�}
:���f�?��,yF��x�,�n�;����/hy��5�N& �zs�M�?O���|�	q,�������Dh
����v��7�
��+�x1��b�_���3�w$E�C��������J����-��b�|K��_�H��Y�0����d�������38���� ��d%�K�O�$���������@cB.���;�$>A9�^
 �����4N��q��~M��_��������>����uP�E���#l�r/����l�:)U����G��a��s@�"\z10x8�~	�j��KCXgiuG�*�s�<[��@H�/��T��D�H�
��C��O�
��J�'h��*���0��
�c�wy��e</Qf�e�C4�z+|�`Ut�V��>��D��P�`&#�<�x���j�����?y/ ��<N��B��Ju]�eP-��z2��z*���_O���"�l7G�O,�%V�������2���������;���-���b����)��g�{������Xk	?��D����/yM������.�����wB=ih��(h�P�&�6t��
*�i��E���x�*Z.�U('���$�H`�4.��Q����?����������>����E��n�o������L�����.�o���7����G�>����l���� ,]�"�m�R�.���f����{���D3��
���~+�oE�>�����U��������w�-�ReW�Ya��{�$�,Gu������B����M���^#�i�Y%tag�
b��
��3�^r!���]a���Z���?*f��
R��g���l������u{����l?
(���,��D��A�9��bo����������?����{��Q��������1���-�e^�����p����_x�<���?w���������;���N��e�;�29_������s�B���M��!�����$���>-������e�qq���_#�w���9�,"~�K��<q�*��n�>�R�������-q�/ �!y����2�(��S������h�������������7qD��3�����H��R'�h����@@</�%n�C�w����|���R�j���tp���F�����6��>���]�}1�L�!mF�C~�Y��1d`�|c��DY8�ec���l�!��1��*�nW7���nt���F7���nt���F7���nt���F7���nt���F7���nt���F7���nt���F7����K��_��/����N��T,�g���?M�m�W;��i+�L|�
�B��p��Qj���S%m��JwF�8��L�sc��=��F����y�H��b��J�p������6����}`+8%�Cc���z�I@G����@P��{�G�Q��(�:Uy�3��*�����"���W��X����$W�r�����[�W����z 
(���{�+�*�4;��j�r7-�r;%3F~�~k�S������R����Du��2���j��u�Q���O?)��kR�����k��5hr����(��%�#���9�%�.i.)�Z���:H�Bb�,�<�%�������3�����4Z�����^�W)i�����JAke�b���b�,j.�U�S��H���2P�bj.�v+����M�oe�3��	�J�D�(�����O(V��U�db��Q����L�0'B,~��A�����kFE�.e��E���dS������{h�]-Y����
��zQ)�S�!-����j�2�a�ZL����u-�J��@)��C��Z&�~5B�1k�1S�1S�����>RV!g�+�<e	��#,�*��)yE�;���8wC���K������"�y[�I�UO(��P��,lI����[�%�����)�5C]�P�cSF���'�,B&[����W���'�_�{����|��n�*��������c4�����������l��#��n����m�*z���;�
t?�3Aw����|���[[@��;�1X�Ls�8��������SZ����OQ��+h�S��r@��������c|
����=B���|
miN]7�����h3�bu��=�Q��|(�Hs������w�>������j+����#(���Jn~os��d]���'_���r#��clTJ�K��lT��>����@��_��c��5�.������jV��������� CM��'C�ogG�a��W�X�����2`9�{R�}1p	�;�R��X,�5��y���y�c8��c8�I�y��E��hG8���$9����&p4I��&p4I�:p���u��u��G���G8�$�8�a���Cr�0�aH�p���%��%�(G��(G	8J$Gp��p�G@r�G@r8���N���Np8%�S��"@p�p��q�q$�p��d�����`���`�+Y��e/X��e�d���`��B)�Y
,���
�m�mo��m����ap��GXr��Ga�Ga����$�pl�pl���.���������z3�Z����t}.�R�/���M���FI/��$���%]B��>I������������4�|`=�x0�����@�0r�d�X�z�V��&m����'�c���V�I]���y�:�;��i���{������J��x�-����2^f����^�������bk{�j?��������������
��2]���ts�@+�#=����6`#pP�}�|�/�z�|���rP��h�<"r���N�`[�� �h��|��K@Z���<�\8�_ma;�PxE�1��������~$Fn��y��_��\�djs�+�j�D~U�N��	������Ql\���_���Q��GnOVOA��\y��r���Ar�����
��3����i��J:��NV�2��?����9�?�`�oZU�W�[�d�����n��7W[Ey���4,�c�����w�.��������>�f$_�~��M4�/��-F�������A���������|�7�O����V�������QE~����V��������_�#�K�b����#$@���{C���[��O*oe.����i�i�i�i�)��c�a�6���f�9�l7[�f�nV��L�T���C�o����O����Ve������>������NQjy����6�v����N�me�q�����,�����C��B�����py�6l��Z���k��+[M�oeQ�tef�=�t�5�;���+�ih �gq���=�U1��g�����>�������������"�n�
�~B����<�;����I�4��T�����E�:����b��$�BAP�<����PQs+Wv�
�rV�rV�,�2Qn�����m��,�O�_���O��@c�[���@��
�zQ���d�z���~���E�:Y������?���Qd�lKa?�����%���L���54�Z�-Z���Y���r����k������-]$2a��i�s�>+�(wVMxinM`[�g~&���/�f=3|b��g�Y5���~�s��4�TU�W���Um�W�Le���z�VU��dW��*�V�h�Z�UeT����z_W��LC�"���������`�P�s���;�K3w��$[�!l�v"�Ou�j��u&������.����g9�����G�I�
WN8�^�J����s�@|d������?�J��sIZ����?�Y�h���(���6�kBmx�8��dBSM5
H��HS���b�mCf�`Es"b!H����e��
&.�
[|���?�|�s_�\,��|IKN�8�,l)�8�
�����p^VA�c�p�A`]��>��7�o���\G���H�o[is�F��$���[��r�wOsV�lx��B
����SWQ���B���x�d�K_�3k}Q�mQ�If.�L�Jb���>���y�
endstream
endobj
18
0
obj
<<
/Type
/Font
/Subtype
/CIDFontType2
/BaseFont
/MUFUZY+Arial-BoldMT
/CIDSystemInfo
<<
/Registry
(Adobe)
/Ordering
(UCS)
/Supplement
0
>>
/FontDescriptor
20
0
R
/CIDToGIDMap
/Identity
/DW
556
/W
[
0
[
750
]
1
18
0
19
28
556
29
36
0
37
[
722
]
38
47
0
48
[
833
]
49
67
0
68
[
556
0
556
0
556
0
610
]
75
77
0
78
[
556
277
0
]
81
84
610
85
[
389
556
0
610
0
777
0
556
500
]
]
>>
endobj
20
0
obj
<<
/Type
/FontDescriptor
/FontName
/MUFUZY+Arial-BoldMT
/Flags
4
/FontBBox
[
-627
-376
2000
1017
]
/Ascent
728
/Descent
-210
/ItalicAngle
0
/CapHeight
715
/StemV
80
/FontFile2
21
0
R
>>
endobj
23
0
obj
<<
/Filter
/FlateDecode
/Length
28
0
R
>>
stream
x�]P�n� ��{L��RR���}�n?���A�a|��w�M�@�1��ag��{����_S0=f��	��$�p��y&���oTO3��8��u�8u~�m��Qq�i���
'�a�%YL����8���%�O��gL)�8�����zB�U��,�]^���a����Xn��`q��`�~D�
Z
�GZ�����bS��
��l.s��������*H�
��D�%��������A��_�v���Y�������J2��:�bQ���s��
endstream
endobj
25
0
obj
<<
/Filter
/FlateDecode
/Length
29
0
R
>>
stream
x���	|T��7��s�,Y&3�u���0$,��Hd��B	���w���DP���"P�CXKV��E���K�����
�����{CQj�����d��y��>�y�s�B�����4�M�fnhu�_�B�#D����\z����X�1��K/�~ZY��ID�ID�=��N�<����D-P��tD$wNo��x�[L�|�u���p5Q���fO�,Mr��c��_>��9��=H�g���\5uN��Cw!|Q���)�r<E�z��������0>�t��K���@��6���v��1�z���f�#eP)�5������q�YL#�u �~�il�"Z�yXM��b���S�_�|�C{�� 5��4�f�R1�������m����4GT��]�����m��h����)�4�v��x������/���B�R����U�B���R�{� ��EtJ�nY����g" �i�Q�o����rhM��]te�c�1�8H�h�:����V|��w��Ht37�Q&��0�����������f��YjM�H�M��}�����lG���#���x�R�#�Ao�B�O�?��������~��y��g�^�D�(��X�Z��+����;�[I30����D��*�+�o���Ig���FV���G�y��HC�J�*���r�|X~���?�����Q��.�����)�E1B�RL��Bq�xH���e_9Z��G�������~������w:?�/��S�����4�pz������+������!D�!�'����Y,k�Z����V^�/�7�[qR�N�-�ds|��*y��_>"_��U�w����5�
��Z�V��F�j����}�g������c��1�Z�z�����[�������ns��z�_T����~���a
�0A*A�'�;���,�.1wY���-�`f&���Jqf�v�B<���[���q}��������'���+9U^)��{�f���^si	�WK��h�	�Tm�v��\�j���}���N�k��zPo����@}�~��R�L��1����g��r�g���n������	��][]o�'�;_�-�5����-Z�������L��|�<�*���*��E�&�Y�p\��%{����^���+�'d/m�,F�L�������)�_�#�N��e�|�3Q�,�:�V�,F�/j�B�%z[{_�����/2���6\�;������G�����&�"���O����/� F�N�_�A��\�]��n�Y��t�x�ZT���]�Y����I����+�m�ib���������4FW,Z��J��	�
�Q�]M������z����6T?�)�c�D�J���Q��&.%M��|�0��<���:Re<d�V����}���	�s��/�@B���A�	4{�bH��i�s���KIR�H�~$�3����K�
�^jy������'t7�w��Hs(;�=1�1@��`��5�-9J.?{}1��"@_��[z;vP��E}�%�!pw+H�����>�(�F������"�����x���SFP��t�2F;�	��&�
��Q��{#M�#��������1����?�#�������%�z����K�N;�o���M�V-�[�������r��2�i�)�~�7����v9�&�-��L���A��q8<�EL��5��<��$�-tv�rN��1sFr
_��J��
��C����P�7������P���U�e���?/Be������*��fzM��RT�1!�����vmic|�	�E3�s6���BydFY����t*�.-�f�K�Q-�lret���������vm�����%Q
��zU����:�G]���
���vw��:]2�0�2\9y|yT�\�m��ni4���g��<������ZMY`F��55C�U#����[Q�:PV��T3M/�$Bk������M�x$<*s|S�e3if(��^3s�&�&J#������l3SVY�fty8/�';\1�4gc*���~Sf$�yvJ��}~sb7&y-O���gjC�����<�af�(|"�BO��Sv����)=�
�
�R�J���h\�I5������|_8T�-��G�~v�d+�������|��jH�����h�6�"��XS���
wm���:���@0}4s;��g�?/����]�@�zD��%��)*���I���NI�)�vJC�Iap�fbs5-�.h������M��?�<�L<*<x���PY�$kn�>+d��hH�|����Z��|2[S�`��
�9P�����������W��Md��yy�f�:��R�L1�����g�{�>�{�5:U9x��������jf�XO���B��4;3����=����?g��QV��������vm@�����L��\gT_��5���j��M����~gvt��
��t��BR��a�h���X4j\�6�
�F��J!�O�W�������"*Vr,Gr �,0�Z�V���E��U��"TxJ� ���M��f����������������s������R(�A��]Dy�<>�{*��>q�I
��9����c��qj��lj	���_��,KNwgeg��o 35�d�y3�:&���Qu�|,��eefj";�o��A������'���+�=�(Wlj�LW'�������zu���
}'&9~�w�9r����W���/�������o����!����H��una�:�:E���Z���� e�iA���/�
_�f���S��-��CN�H�t_Br�R]p����s�����[�N���p��������S��]
4�Y�I�v���-�-�c�(�_1�_)�5�������F�B$�����q��G�}q���{��s^Z5w�]U�W9�����k�;��O����������+f!Nh�G���e�D����W!}���A5�x;v���s����h����w�&5��������o�71!��g��y��aG��.����>3��#�O�19Nv����|�n������\�����d���f�W�aO$����O�C���p9��0��D����?��8jrrM�n�x���HfB����q�����*�������@�a[&��4H����E�s�\����?��J���!if���2>m|���Y�Y	SR.K��9)�zy������������o:�Lx����]�-�$<�l�i��_�T�����H^�K�8Aq�8��9��p��eA�Z'����"�HEC��@� 7� ����������Px.{'\���$�	W����?�����|��KfVNON�Ips�����Wn�r�����kj������7��g����=����.� _���37�o������������>z���_�Xf����Ip]<�)�7{��x5BRn�9
�j���,}��[>����E9R�s�D)�����Y$����8���#�_F���r{%)���G2�ylQ�����������.�9"��L�.J�(���1���R� P2�t	���Q,��<�4��ya������[�����������Es�{��v����e	N3.�B�l#v�������{�=*Fm*��&������y))�1��_y����|���:r��28Cn.���$!%7�G�['wDe|FF(��K
B��q���T�,R���=�x������j0��K������9&7����ZTm@2��=�f�\������5�X�[/G/��.��>�������f%V&��|C������d}�},+qW�s)2���k���9o�,&7hV8+7��v:�d���d�s�4!�Y9�'�W'�4�/�u"��G@��MB&��5�����Q�4���x���U���P�|�G$�����(g��P��e���7�m7�UWa�����JN�@�M�����gaR��$�.�d;��{�1�*k'���tuc
n�t��fi��t���2#�7+��}��[�R����Oz�k��n���d����|2m�}����������||����������B�y�PB���@�lj$�2^�@	������xhon||���=�u���'�Id
J�x��\�Nd/(b�{����\��������^���b���N^�VO��������/�_���L��73�2�j���<5������;B�b��DO��hW�����C���Gt�`���v�8e���8���nz��%N��8�A'WM��P��H��egu�Y]
Y]UJl*�H���s\�`Y�@��Q����.za�d���u�^�;
�ic��������y_�)f1y���"��`���=�2+ ��7xm�`q�K0B�n>0k��kn�<$59��n��KR7�}�����Vy�����|��Z�u���������z���-�.����H��������O�Z����9�Cn{�s�(��>�����G�p�r�Lq�\��N����]�.�x�u)��v��<�n�:��Tn�rM��\�8=qE����C�!��u-���������O<����i��!���]������WE�I�����ur�sG|�Y�K��8�M�By�V�E|����"��]\�y]]���7C�'�� IMra.��[����T'����;�`��y[��>�)��;^r��_\,,KV�<�z�Xn����x�6����HB�b�y�bMo�����3 O$51�P�X=�81RP��<�]��
k����*:�;����_���o����w����b���S>Q������XN�}��x�r��HJ�Z�d�&(������"���(�0Y�Z�6h�&23�/E���M���Lj���B�H`�=<�"')>7--'�UX�W�ss<I�\({e\*��X�\X�����:�R��L�d����u}��f�S�Jy!���w��q)��6YZ\G���P($�/%>-9%�@�75)%5������pG"I��dR�7�&�N=����,��^"~���o�o��n��;�P	(��d�*�e����+y��y=j���K��.g��	%�����	~�������)����rX'�m�����<
���R]|�����.�u��%/i��]����
�����=w��?����;��YQ;�O���3����?��}��f�2��=���Hm��cjn�u�'d�������(��Fy��D�I��Tn%9�R�7�$�gtZ�+�b����Un�#<X1�������\A�>.��'���T��h�x�a�%�q����[���L`�j7+S��"i����q��C��JW�{freh����;�r�t���w�xa[�2����:c_�JpqB�P8��	~��p�D?���l;��J��,X���|{��m��o���*��,� x�9�}}��B������]��s���u�8��O�����3�u��P����e3���u�����������S�)e�i��L�!�K)�9����:^8]�����	7'��;�5���
���m/�5���Kd���n>}���P����?�������u�U�����u�����v����S&�����#7��b�x���8��	�*�V>�������m����Cs"I{<B�?����<X�t�B�K�Ti����l>Mfy�Uq_�00�D���-����dm��|�'\Y2����|'����t�-X�/6
?l���yiN���p������-K��������?���7,y�>��d�;��b�#|s3
�&�&���A��r��z�����w���*\��v����Av���N���N�ig�pk{��S���H%(�,���\I���p�
��%�*�,�"�P8#�7'Rvn{V�8��1��'��:�r�=�lQ����[��A��e�M�}�`�J�4NdM�����=��HK��i��4������}��'�#��
w���8��8��|��"������8xNE�s$7�%���IS#=3>�1�%���`���k�h�~A��&~���A�D7�y�\��	W%^��!�N�K��[nO\�Y��'������9�,&�P�v��)��C���DtcU{�h�m1�k��\#J��V��q���The*�36���*o$Y���y��N���P&�L@	�@UTZ��4��B%�P�-Tb��4� J����e���M)4u�q�A�5X����$�Ee���&\u%]YqN����+(�j����3!&%���i,���9�}�k���._����[o���������x���z^�l�-k7�:�)-���3W����i�n�v���	���������6q���OC�
{���uk��8��������[%���0]��6w��������S68���/M����
�jR�dy
�;�g_�m>�����,�y�K���0N��0MN(���Vy�EZ1������r���-Lh��M
&
K���Ri�� ����c�G������pB'�%:����n������
�:q����Q&�_m�i���b�*j�P�K��!���/�/O�.K�<���
Y7�.��3wE��Y;��L�4t"����������+��%[�a��@^�j�;,i"��9��x}���7s'����'h����bJ��x�99�NN`m�?�L\��-�HL��j��M��o�G��_��������=d��#���4�J1��&a�U�[v����4(a�$��ua�P������!}��Q7
�&���|�)��{��o���g��/=1�������Z���p�����;K��������������7�v�����{YU���92WTyI�Do���l�z�����m��4�����=}�����m2�J�r���Q}���7���o�x*�V��a!*�B���(
�����u��r����u4��5���]�������"��}�dYqC���(#�6.�:�p=�V�8w�f;����r�>������u��?�r�t���e�;�����S������G��?���2�N��F=wZ�m�=O��*���u^,@�A��'��P��Eb����t�_��@�E��;���Z |�Y������3T,Si'h��9n`M�17�	����C�}�h�w@X���5�[,n��Zg��d#�A�\B������<����_��t����L+8U�2N���*�8�@�
��G%��#p������3����JQ�-�J�����J����v�?Q<�X������s��-X��h�����T�q�~�9��[_��#�x�30v�I.��QW���k�PZ�<K1��Au ��`C����E=��h�>�����`��&��)~�0o*�o8�a�7���V��4��j�.n'������u�~a�E_6�u��b�����Y����q2O5P�=�+�}P{�eS�w�3���r-]>��y��gS��55'�-i4�j��jDa��o��=
t:=�:'9/�LYE���4H��.��Q����;: �A����F�wSg��0���2\��LX+����C�(��J��l��z������f�����m�1e4N����	��������/�������+����@5��](t�u�1�sf����Pw}7�'
r{�c�.m)-�o�j���h�+�&���ihK�I�1�~�9���,���%���KY�[<ub��l�c'�o�G�����X>+�
,0�����?���w�����b�31�/c��-���>E?��g��2�e$�9�3v�X��|�\>f9|��Y��������9����0�������Z����	���)����Zn�[����K�xJ����3]n��������Wzt��_��Y��8�u�T�]e�A�'�=K��9_AK0�Lm!�#���<'j-��X'j`�Y-���w`/p���W��]���Wq��L9�q1�q~E��1������������w_Mw��!��?�<i�|��D�)�\vL*��k
���!��Z��P�5���P�a�0�\�Ng�T��W��c]�=��UM��c���h-�x��p_P.K�����_� �A����q�Im=�s�:�Uc��S�Q�9���^��2v!�m0�8�f{�����9��"n�r������;`�.F��%�	m/F<������_\JqV+;�T�NA���Z�������<�A���l4�M�����Pq>��<�G7q��L���"�u�6�����NZG�]?������=�yi�~���h	��j�E1���-9��o/_C�A����"�B�HU�F���O�Z���.�I���Z�8m,����3��|�����}�S�A��FL��`��B�)�������6����9���������'���@�I�G���X%����P�^�5�c^�`P���U���]�9�����x��v�J�w���A7���!�Q7��[	<�d�5�s���pd��
o���qc;#6?���������/^�p��T�5���D|.�����O[��F�?����>�c����M�7�n#bj����o�)����	j~��4��(I�i�:V�I>�j� �p;�S����	��������yl|l8v]���hbc�|���Ro������{?�f8_D��?�O�������l���s�d��k�����@F�W���@�]����������������L���^���A�"��thh1�(�m�x�����8[��+O����cu�����>`���m	�>��.��>�#�>�%�Ft��T�$��h�� ���5�����KA%:�-�W!��	C�����+3��*���e�?�G������O�f������@�}��D��A�`�����5�N��B�2��e�i�m� ��3���9��N�}��w)l�)�g���@���!�mj��yh�Y�^���Fw1����>��m|���3�M������1N��Ld;�mY����hQu~Sv,�%J�)��l�����+�jug�P���|�/Ko4���8�|@�Eg!�w���2d���-�F�3N��0^���B������ ��@��u�-[ c��������#�������X��.`���������X����n����a[����M�����k���8O�|v��
��?;c���X� =��l{&����~.�l�o9c��}���
��
c��r���C�@^��7��Q�����}�:�7P'�����G@+9
�1���������O?���[�<?��-���>��)9���OE@/ �\n�5�!��[Z����8�[�e �</�JW�"��,Nu�!�#�������C��8s�g�v���\����� ��+�C|�e�Qwz��u%��(�A��{:���n����������z�b��8�hw�z&4K�{�o�~-�J�;�T�.���X_9��O�c4�G��:��������������zV������h��|+��V�����J����7-������.z�Y������*��s���]fV���5�X�@�o<
���������K�Q�P�=�yl����|^a�8�}��'��s����i������i�v3�}�����o����5��}������1[��M��X����{�*��s�j��P�u����=���7�����~�u�/)S?
�w����
���%���+�A�>��v�k<��]f>7s���_�Pn?;�Aw����>�j����F��@��/�g�^�Y`���F�;�3�z+uo�J
`���n��nQ5W���\�c����DHsk��;R+��9���_h�c��f�~�
Y���������)W�AS4?U2��e�(,u���o���~k5��	�����i:�p�`=�eLe�u"�zNXa���~��V��u�d# ��!pR����Q��C�������b�2�Xhe�3P�{�l���2-�����������������?�����c��@|��?Vo8��D����F?~l�[��-~������	����8�>�WK�:�W�~�/�iV��V�_8�8+�,@�|^�w�jc����1����1��c���l��m���Mf��@�N>5�Sm���VX�[d�5�^����9�U��0������~�	���|/����s���1?�u�����2ctu�k�I�i��������(y��U����+�NN��G��
,�SU�;��M�d�z�7K?L�E�t|B�+�T�
�x �-�P�eP7�m�9��4P�*�3!~vr-�����������.��9�I�����e����:��t��r��<���i�WA�D*v�J�����r�s$�.���A��j��u�?������^�����v�
��]���'�����B`��3�;L��:�~o�1sR��s�z��4�8���(t�������h������Z���O���?r]J��>�;?�<��x���8�>��j}����s-�>���u���jZ��J��5��`SXw
w�x@Y6�����
�Na7��4*��x�N$�Z}R��v��,{���.ti�O�4���yI�Q�(�5�l��\����u��;����k�������Z{|.��<c�G�_��&�N���g��f:��M���f�*m���4�����d����M�sH���j�K�����{�������a~����c�}���0����G�E�����X�����lRe2}���a^c[/�����c����k�3�����Me����r�C�'s��N�w���;�,�����N�u������5!��c�~�P=��ws~
�p�
8c�>����&a�.;����p��r�;���-&��-|ea
C8K�=�0�G�����:E����	�~���	`���NV����'+��:j�N������G{^0�O1��
}���������v]�[����7����M��=�9���Q��]�u�b��y��X�������*iS�OS���
e~�Kq6eXa�����+`�~��U�k~\SM�s�4�I��c�^�`��i��k7�V[��Y�@��>��?O�����Q�y�X=�@~�c.
�/�q��p���c>lm�na��U��g<k��T����g�\�A{U����v�U&�?3��������0�����/����p}��3(S�
���y�6���������wn��������@Mx0/�����7�_�����^�����*o��[���Y���PP���4�N��w��.�p�p_�@����AY�p�;�f����6
��Wa��!�J`6�P/P<\CU�I����h:�u�l�@�w���tu��J���+Q����*c�I����J��C>����BK��N���r���;�+p~�f��;��q��O��`������.��}����+��e���v�+f�X���^������
4��j�8
��]��a*q���}��s8��3������4~�X�K|�x�����,���i ����
����c�w��G��k���z�"S�O�{M���R�
�x0�z�{��|6(��n���J���v���1[��Q�

w�L��6�-�DQ���k����B���T�_��R��������F���3��~�%c�C�������q�w5~���Tb��8���
�<s9��2�*���Wv9��~���q�N���GM{m����=g���u'�����_��u�����R ���w��)�w�>����.��wY������[����{�.4�e-�T�3��n�� o*Y����Y�2e�8�@,���d?�����/�X2�����}&4��_��$Sfi���X�A������e�/��)��{�c�8�%?����:S��O+���)'�,�{H���Q����� ��|��e[���;lz>��*��*����������Qk~����E�Y���:�B:� g�|��]���|�/b��<���>���f�o4�M(=�����x��!�
�8����8n���'���;�~�Y�>k��W������:�;IJ��lt�}���!�OO�w�Aw��zC���W�?_o��T������4������9��|��8��tn7�����{q����~;ec���*��Q�6?
���!�L��C��@�8��K��%�E��Z�������	MhB���&4�	MhB���&4�	MhB���&4�	MhB���&4�	MhB���&4�	MhB���&4�	MhB�����hE�P	=J.���"�__�g~O��h��jSA ��N�5�����Yp��RkV�+������:y���B��H�!���g�]�N�\������g�]����.�����c�aN��i9����oK-e3�G��AG�(�L��*�����c*%�e���}���S�M3/�������	*���
�a���l=�l�������e[�&�w�f����o���A���s�
���BP�Vii���b"Z������$4�	����[������G)���ky�L�G6%�;=��B�!=�4�!��h�<�s���x8
8�a|���=�y����LvG��\�|��E���H�.\�|�z�W�
���mt�������)Oa��	�[��l�����N�V�]kpTV�CkN�����6�c�N������6�
���v�oP���h�
��I��	����I��2`�ep}@H��I�0p�Wk�L�|���_�o�|Y����A�GE�$�*��|Q��������� �M@:���Z�t�|~S�����/wa��p��>�0`"p7���d���`2*�A�������O�7Ef#���!v
z�>8��+����!�N�]���N��K�c���[�c���k�c��r&|���;�F��N�|�E�`�a�D��W^�Y��t-f�Z������t����m�`�VD
[�	Vo�;E�HQ�FTO�7��[Du�����.�9�:WTGD��SQ-"��
G����� ��Du����-DuHt�����:+R������@���+�0�y��<��]p_� S���93�i�Mm����=;��;H���/`^�����T�*���LvGp"wst�n�z�}���|�(�T�9
H�mu�Y��"���8$_��9�y2/����+�
����\1,����)=����n��l���_��P\�8y����a!�Y���������;�}���)W��b*��=�J��R��i���A;���E1omA��v�������8�EN���������tQ<���[�o�,�/�s#fgA��RY���n8���������l
��308+G%L5~U�P�Y0.8���\�T����>9�
����r����B��m����Q��sU�c���������r�0W7W'W[W�+�j��v����>w�;��v��n�-��N�?q[��1����#��[���>��4�$�nIR4E,��'GwO�����'F��D��qQG���&����E{�s#��G]�Y�Q��*����.�G��M���^��X����K+*(�~M�@�������p&Yn���8��,�|�����f�N�1�U��7*4�|��F++�&����|��[|S6��������X��B��������b�|r���V���Q�Z0A��8�W����T>]p��U-�J7�h��d��J���5�s y��U��j:��H��<��*KN����,"�rT�����=������!�b��&���1�x�y<�K��L������Pl�U1e|��p��p�T`R��k�����B�TpB(�L�d�t���F+�SK�S�������#y<'�
�n��e��7��L-���U�\Z�i��.��jkqC[]�����\Ynk`�s$w����Vwn�;�502P�E����otS����M�I&��_'e�U�K������W^�����V�RBaE41�/�8�]�v}9	{������7����.�ZI>D����p��UWS�lF���
D���'�t�~����hdri�\���6�G��W���B�$R����PVg�6#�#�'GjZCF�+���8+���j���]P-wl�\1��*�h����`�8�u������X=TU`�U�PT�uX�V�^���W[>k.�Z�,�"U��4|x�
fl.*����f
endstream
endobj
22
0
obj
<<
/Type
/Font
/Subtype
/CIDFontType2
/BaseFont
/MUFUZY+ArialMT
/CIDSystemInfo
<<
/Registry
(Adobe)
/Ordering
(UCS)
/Supplement
0
>>
/FontDescriptor
24
0
R
/CIDToGIDMap
/Identity
/DW
556
/W
[
0
[
750
]
1
7
0
8
[
889
]
9
16
0
17
[
277
0
]
19
28
556
]
>>
endobj
24
0
obj
<<
/Type
/FontDescriptor
/FontName
/MUFUZY+ArialMT
/Flags
4
/FontBBox
[
-664
-324
2000
1005
]
/Ascent
728
/Descent
-210
/ItalicAngle
0
/CapHeight
716
/StemV
80
/FontFile2
25
0
R
>>
endobj
26
0
obj
320
endobj
27
0
obj
18821
endobj
28
0
obj
254
endobj
29
0
obj
14049
endobj
1
0
obj
<<
/Type
/Pages
/Kids
[
5
0
R
13
0
R
]
/Count
2
>>
endobj
xref
0 30
0000000002 65535 f 
0000118216 00000 n 
0000000000 00000 f 
0000000016 00000 n 
0000000142 00000 n 
0000000223 00000 n 
0000000388 00000 n 
0000082688 00000 n 
0000047261 00000 n 
0000047282 00000 n 
0000082636 00000 n 
0000083031 00000 n 
0000083180 00000 n 
0000047301 00000 n 
0000047470 00000 n 
0000082859 00000 n 
0000082594 00000 n 
0000082616 00000 n 
0000102617 00000 n 
0000083324 00000 n 
0000103009 00000 n 
0000083720 00000 n 
0000117666 00000 n 
0000103211 00000 n 
0000117935 00000 n 
0000103541 00000 n 
0000118132 00000 n 
0000118152 00000 n 
0000118174 00000 n 
0000118194 00000 n 
trailer
<<
/Size
30
/Root
3
0
R
/Info
4
0
R
>>
startxref
118282
%%EOF

#21

Álvaro Herrera

alvherre@kurilemu.de

3 months ago

In reply to: Tomas Vondra (#20)

Re: Proposal: Adding compression of temporary files

Hello,

This latest patchset is failing CI because of some compiler warnings on
Windows, see below. Also, who is squashing all these fixup patches? I
suppose it's up to Filip to submit a squashed version with the ones he
approves of, but he hasn't replied since you (Tomas) posted them; I hope
the series is not dead?

Thanks

[14:52:10.761] buffile.c: In function ‘BufFileLoadBuffer’:
[14:52:10.761] buffile.c:725:95: error: passing argument 2 of ‘uncompress’ from incompatible pointer type [-Werror=incompatible-pointer-types]
[14:52:10.761] 725 | ret = uncompress((uint8 *) file->buffer.data, &len,
[14:52:10.761] | ^~~~
[14:52:10.761] | |
[14:52:10.761] | size_t * {aka long long unsigned int *}
[14:52:10.761] In file included from /usr/x86_64-w64-mingw32/include/zlib.h:34,
[14:52:10.761] from buffile.c:64:
[14:52:10.761] /usr/x86_64-w64-mingw32/include/zlib.h:1267:32: note: expected ‘uLongf *’ {aka ‘long unsigned int *’} but argument is of type ‘size_t *’ {aka ‘long long unsigned int *’}
[14:52:10.761] 1267 | ZEXTERN int ZEXPORT uncompress OF((Bytef *dest, uLongf *destLen,
[14:52:10.761] | ^~
[14:52:10.761] buffile.c: In function ‘BufFileDumpBuffer’:
[14:52:10.761] buffile.c:854:101: error: passing argument 2 of ‘compress2’ from incompatible pointer type [-Werror=incompatible-pointer-types]
[14:52:10.761] 854 | ret = compress2((uint8 *) (cData + sizeof(CompressHeader)), &len,
[14:52:10.761] | ^~~~
[14:52:10.761] | |
[14:52:10.761] | size_t * {aka long long unsigned int *}
[14:52:10.761] /usr/x86_64-w64-mingw32/include/zlib.h:1244:31: note: expected ‘uLongf *’ {aka ‘long unsigned int *’} but argument is of type ‘size_t *’ {aka ‘long long unsigned int *’}
[14:52:10.761] 1244 | ZEXTERN int ZEXPORT compress2 OF((Bytef *dest, uLongf *destLen,
[14:52:10.761] | ^~

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"XML!" Exclaimed C++. "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.
https://burningbird.net/the-parable-of-the-languages/

#22

Filip Janus

fjanus@redhat.com

3 months ago

In reply to: Álvaro Herrera (#21)

Re: Proposal: Adding compression of temporary files

Definitely, it’s not dead. I’m a bit busy at the moment, but I will address
the compiler warnings along with the review.

-Filip-

po 20. 10. 2025 v 11:36 odesílatel Álvaro Herrera <alvherre@kurilemu.de>
napsal:

Show quoted text

Hello,

This latest patchset is failing CI because of some compiler warnings on
Windows, see below. Also, who is squashing all these fixup patches? I
suppose it's up to Filip to submit a squashed version with the ones he
approves of, but he hasn't replied since you (Tomas) posted them; I hope
the series is not dead?

Thanks

[14:52:10.761] buffile.c: In function ‘BufFileLoadBuffer’:
[14:52:10.761] buffile.c:725:95: error: passing argument 2 of ‘uncompress’
from incompatible pointer type [-Werror=incompatible-pointer-types]
[14:52:10.761] 725 | ret
= uncompress((uint8 *) file->buffer.data, &len,
[14:52:10.761] |
^~~~
[14:52:10.761] |
|
[14:52:10.761] |
size_t * {aka long long unsigned
int *}
[14:52:10.761] In file included from
/usr/x86_64-w64-mingw32/include/zlib.h:34,
[14:52:10.761] from buffile.c:64:
[14:52:10.761] /usr/x86_64-w64-mingw32/include/zlib.h:1267:32: note:
expected ‘uLongf *’ {aka ‘long unsigned int *’} but argument is of type
‘size_t *’ {aka ‘long long unsigned int *’}
[14:52:10.761] 1267 | ZEXTERN int ZEXPORT uncompress OF((Bytef *dest,
uLongf *destLen,
[14:52:10.761] | ^~
[14:52:10.761] buffile.c: In function ‘BufFileDumpBuffer’:
[14:52:10.761] buffile.c:854:101: error: passing argument 2 of ‘compress2’
from incompatible pointer type [-Werror=incompatible-pointer-types]
[14:52:10.761] 854 | ret =
compress2((uint8 *) (cData + sizeof(CompressHeader)), &len,
[14:52:10.761] |
^~~~
[14:52:10.761] |
|
[14:52:10.761] |
size_t * {aka long long
unsigned int *}
[14:52:10.761] /usr/x86_64-w64-mingw32/include/zlib.h:1244:31: note:
expected ‘uLongf *’ {aka ‘long unsigned int *’} but argument is of type
‘size_t *’ {aka ‘long long unsigned int *’}
[14:52:10.761] 1244 | ZEXTERN int ZEXPORT compress2 OF((Bytef *dest,
uLongf *destLen,
[14:52:10.761] | ^~

--
Álvaro Herrera Breisgau, Deutschland —
https://www.EnterpriseDB.com/
"XML!" Exclaimed C++. "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.
https://burningbird.net/the-parable-of-the-languages/