From 87b823821abd4ef53214ba5954d8e8069a1c40fc Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 14 Oct 2024 15:48:01 +0700
Subject: [PATCH] Consider extreme skew in batch 0 during Parallel Hash Join
 execution.

Batch 0 doesn't maintain the estimated_size value. In this case, extreme skew
caused by massive duplicates should be detected by the same number of tuples
left in this batch after the resizing.

This bug doesn't exist in non-parallel HJ because this batch repartitioning
procedure is based on the hash table size and number of tuples in the batch.
---
 src/backend/executor/nodeHash.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 570a90ebe1..319006d69a 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -1242,11 +1242,16 @@ ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable)
 
 					if (batch->space_exhausted ||
 						batch->estimated_size > pstate->space_allowed)
+						space_exhausted = true;
+
+					/*
+					 * Batch 0 doesn't maintain estimated_size, so it should be
+					 * treated as a special case.
+					 */
+					if (space_exhausted || i == 0)
 					{
 						int			parent;
 
-						space_exhausted = true;
-
 						/*
 						 * Did this batch receive ALL of the tuples from its
 						 * parent batch?  That would indicate that further
-- 
2.39.5