Speed up Hash Join by teaching ExprState about hashing

Started by David Rowleyover 1 year ago8 messages
#1David Rowley
dgrowleyml@gmail.com
2 attachment(s)

In master, if you look at ExecHashGetHashValue() in nodeHash.c, you
can see that it calls ExecEvalExpr() and then manually calls the hash
function on the returned value. This process is repeated once for each
hash key. This is inefficient for a few reasons:

1) ExecEvalExpr() will only deform tuples up the max varattno that's
mentioned in the hash key. That means we might have to deform
attributes in multiple steps, once for each hash key.
2) ExecHashGetHashValue() is very branchy and checks if hashStrict[]
and keep_nulls on every loop. There's also a branch to check which
hash functions to use.
3) foreach isn't exactly the pinnacle of efficiency either.

All of the above points can be improved by making ExprState handle
hashing. This means we'll deform all attributes that are needed for
hashing once, rather than incrementally once per key. This also allows
JIT compilation of hashing ExprStates, which will make things even
faster.

The attached patch implements this. Here are some performance numbers.

## Test 1: rows=1000 jit=0

1 hash key
master = 4938.5 tps
patched = 5126.7 tps (+3.81%)

2 hash keys
master = 4326.4 tps
patched = 4520.2 tps (+4.48%)

3 hash keys
master = 4145.5 tps
patched = 4559.7 tps (+9.99%)

## Test 2: rows = 1000000 jit=1 (with opt and inline)

1 hash key
master = 3.663 tps
patched = 3.816 tps (+4.16%)

2 hash keys
master = 3.392 tps
patched = 3.550 tps (+4.67%)

3 hash keys
master = 3.086 tps
patched = 3.411 tps (+10.55%)

Benchmark script attached

Notes:
The ExecBuildHash32Expr() function to build the ExprState isn't called
from the same location as the previous ExecInitExprList() code. The
reason for this is that it's not possible to build the ExprState for
hashing in ExecInitHash() because we don't yet know the jointype and
we need to know that because the expression ExecBuildHash32Expr()
needs to allow NULLs for outer join types. I've put the
ExecBuildHash32Expr() call in ExecInitHashJoin() just after we set
hj_NullOuterTupleSlot and hj_NullOuterTupleSlot fields. I tried
having this code in ExecHashTableCreate(). but that's no good as we
only call that during executor run, which is too late as any SubPlans
in the hash keys need to be attributed to the correct parent. Since
EXPLAIN shows the subplans, this needs to be done before executor run.

I've not hacked on llvmjit_expr.c much before, so I'd be happy for a
detailed review of that code.

I manually checked hashvalues between JIT and non-JIT. They matched.
If we ever consider JITting more granularly, it might be worth always
applying the same jit flags to the hash exprs on either side of the
join. I've slight concerns about compiler bugs producing different
hash codes. Unsure if there are non-bug reasons for them to differ on
the same CPU architecture.

I've not looked at applications of this beyond hash join. I'm
considering other executor nodes to be follow-on material.

Thanks to Andres Freund for mentioning this idea to me.

Attachments:

bench.sh.txttext/plain; charset=US-ASCII; name=bench.sh.txtDownload
v1-0001-Speed-up-Hash-Join-by-making-ExprStates-hash.patchapplication/octet-stream; name=v1-0001-Speed-up-Hash-Join-by-making-ExprStates-hash.patchDownload
From d8efd72a3caee4a7244c466d1f4834bf3d881c3d Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Fri, 10 May 2024 20:22:07 +1200
Subject: [PATCH v1] Speed up Hash Join by making ExprStates hash

---
 src/backend/executor/execExpr.c       | 116 +++++++++++++++
 src/backend/executor/execExprInterp.c |  94 +++++++++++++
 src/backend/executor/nodeHash.c       | 194 ++++++--------------------
 src/backend/executor/nodeHashjoin.c   | 142 +++++++++++++++----
 src/backend/jit/llvm/llvmjit_expr.c   | 127 +++++++++++++++++
 src/include/executor/execExpr.h       |  14 ++
 src/include/executor/executor.h       |   7 +
 src/include/executor/hashjoin.h       |  12 --
 src/include/executor/nodeHash.h       |   3 +-
 src/include/nodes/execnodes.h         |  12 +-
 10 files changed, 525 insertions(+), 196 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index b9ebc827a7..a2585c087b 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3931,6 +3931,122 @@ ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 	}
 }
 
+/*
+ * Build an ExprState that calls the given hash function(s) on the given
+ * 'hash_exprs'.  When multiple expressions are present, the hash values
+ * returned by each function are combined to produce a single hash value.
+ *
+ * desc: tuple descriptor to extract 'hash_exprs' from
+ * ops: TupleTableSlotOps for the TupleDesc
+ * hashfunc_oids: Oid for each hash function to call, one for each 'hash_expr'
+ * collations: collation to use when calling the hash function.
+ * hash_expr: list of expressions to hash the value of
+ * opstrict: array corresponding to the 'hashfunc_oids' to store op_strict()
+ * parent: PlanState node that the 'hash_exprs' will be evaluated at
+ * keep_nulls: if true, evaluation of the returned ExprState will abort early
+ * returning NULL if the given hash function is strict and the Datum to hash
+ * is null.  When set to false, any NULL input Datums are skipped and the
+ * resulting hash value is unchanged, i.e. it's a hash of all non-null
+ * 'hash_exprs', or 0 if all 'hash_exprs' evaluate to NULL Datums.
+ */
+ExprState *
+ExecBuildHash32Expr(TupleDesc desc, const TupleTableSlotOps *ops,
+					const Oid *hashfunc_oids, const List *collations,
+					const List *hash_exprs, const bool *opstrict,
+					PlanState *parent, bool keep_nulls)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	intptr_t	strict_opcode;
+	intptr_t	opcode;
+
+	Assert(list_length(hash_exprs) == list_length(collations));
+
+	state->parent = parent;
+
+	/* Insert setup steps as needed */
+	ExecCreateExprSetupSteps(state, (Node *) hash_exprs);
+
+	/*
+	 * The first time we hash, we don't need to combine the hash value with
+	 * any previous hash, so use the EEOP_HASHDATUM_FIRST or the strict
+	 * variant.
+	 */
+	strict_opcode = EEOP_HASHDATUM_FIRST_STRICT;
+	opcode = EEOP_HASHDATUM_FIRST;
+
+	forboth(lc, hash_exprs, lc2, collations)
+	{
+		Expr	   *expr = (Expr *) lfirst(lc);
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		int			i = foreach_current_index(lc);
+		Oid			funcid;
+		Oid			inputcollid = lfirst_oid(lc2);
+
+		funcid = hashfunc_oids[i];
+
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+
+		/* Allocate hash function lookup data  */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(1));
+
+		fmgr_info(funcid, finfo);
+
+		/* Initialize function call parameter structure too */
+		InitFunctionCallInfoData(*fcinfo, finfo, 1, inputcollid, NULL, NULL);
+
+		scratch.d.hashdatum.finfo = finfo;
+		scratch.d.hashdatum.fcinfo_data = fcinfo;
+		scratch.d.hashdatum.fn_addr = finfo->fn_addr;
+
+		/*
+		 * Build the steps to evaluate the hash function's argument have it so
+		 * the value of that is stored in the 0th argument of the hash func.
+		 */
+		ExecInitExprRec(expr,
+						state,
+						&fcinfo->args[0].value,
+						&fcinfo->args[0].isnull);
+
+		scratch.opcode = opstrict[i] && !keep_nulls ? strict_opcode : opcode;
+		scratch.d.hashdatum.jumpdone = -1;
+
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps, state->steps_len - 1);
+
+		/*
+		 * For subsequent keys we must combine the hash value with the
+		 * previous hashes.
+		 */
+		strict_opcode = EEOP_HASHDATUM_NEXT32_STRICT;
+		opcode = EEOP_HASHDATUM_NEXT32;
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->d.hashdatum.jumpdone == -1);
+		as->d.hashdatum.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
+
 /*
  * Build equality expression that can be evaluated using ExecQual(), returning
  * true if the expression context's inner/outer tuple are NOT DISTINCT. I.e
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 852186312c..0cfd00d613 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -479,6 +479,10 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&&CASE_EEOP_CONVERT_ROWTYPE,
 		&&CASE_EEOP_SCALARARRAYOP,
 		&&CASE_EEOP_HASHED_SCALARARRAYOP,
+		&&CASE_EEOP_HASHDATUM_FIRST,
+		&&CASE_EEOP_HASHDATUM_FIRST_STRICT,
+		&&CASE_EEOP_HASHDATUM_NEXT32,
+		&&CASE_EEOP_HASHDATUM_NEXT32_STRICT,
 		&&CASE_EEOP_XMLEXPR,
 		&&CASE_EEOP_JSON_CONSTRUCTOR,
 		&&CASE_EEOP_IS_JSON,
@@ -1519,6 +1523,96 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 			EEO_NEXT();
 		}
 
+		EEO_CASE(EEOP_HASHDATUM_FIRST)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			/* leave the hash value alone on NULL inputs */
+			if (!fcinfo->args[0].isnull)
+			{
+				/* execute the hash function and save the resulting value */
+				*op->resvalue = op->d.hashdatum.fn_addr(fcinfo);
+				*op->resnull = false;
+			}
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_FIRST_STRICT)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			if (fcinfo->args[0].isnull)
+			{
+				/*
+				 * With strict we have the expression return NULL instead of
+				 * ignoring NULL input values.  We've nothing more to do after
+				 * finding a NULL.
+				 */
+				*op->resnull = true;
+				*op->resvalue = (Datum) 0;
+				EEO_JUMP(op->d.hashdatum.jumpdone);
+			}
+
+			/* execute the hash function and save the resulting value */
+			*op->resvalue = op->d.hashdatum.fn_addr(fcinfo);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_NEXT32)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			/* leave the hash value alone on NULL inputs */
+			if (!fcinfo->args[0].isnull)
+			{
+				uint32		existing_hash = DatumGetUInt32(*op->resvalue);
+				uint32		hashvalue;
+
+				/* combine successive hash values by rotating */
+				existing_hash = pg_rotate_left32(existing_hash, 1);
+
+				/* execute hash func and combine with previous hash value */
+				hashvalue = DatumGetUInt32(op->d.hashdatum.fn_addr(fcinfo));
+				*op->resvalue = UInt32GetDatum(existing_hash ^ hashvalue);
+				*op->resnull = false;
+			}
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_NEXT32_STRICT)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+			uint32		existing_hash;
+			uint32		hashvalue;
+
+			if (fcinfo->args[0].isnull)
+			{
+				/*
+				 * With strict we have the expression return NULL instead of
+				 * ignoring NULL input values.  We've nothing more to do after
+				 * finding a NULL.
+				 */
+				*op->resnull = true;
+				*op->resvalue = (Datum) 0;
+				EEO_JUMP(op->d.hashdatum.jumpdone);
+			}
+
+			existing_hash = DatumGetUInt32(*op->resvalue);
+			/* combine successive hash values by rotating */
+			existing_hash = pg_rotate_left32(existing_hash, 1);
+
+			/* execute hash func and combine with previous hash value */
+			hashvalue = DatumGetUInt32(op->d.hashdatum.fn_addr(fcinfo));
+			*op->resvalue = UInt32GetDatum(existing_hash ^ hashvalue);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
 		EEO_CASE(EEOP_DOMAIN_NOTNULL)
 		{
 			/* too complex for an inline implementation */
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 61480733a1..ea27bc5ca4 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -47,7 +47,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
-static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
+static void ExecHashBuildSkewHash(HashState *hashstate,
+								  HashJoinTable hashtable, Hash *node,
 								  int mcvsToUse);
 static void ExecHashSkewTableInsert(HashJoinTable hashtable,
 									TupleTableSlot *slot,
@@ -138,11 +139,9 @@ static void
 MultiExecPrivateHash(HashState *node)
 {
 	PlanState  *outerNode;
-	List	   *hashkeys;
 	HashJoinTable hashtable;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
-	uint32		hashvalue;
 
 	/*
 	 * get state info from node
@@ -153,7 +152,6 @@ MultiExecPrivateHash(HashState *node)
 	/*
 	 * set expression context
 	 */
-	hashkeys = node->hashkeys;
 	econtext = node->ps.ps_ExprContext;
 
 	/*
@@ -162,17 +160,30 @@ MultiExecPrivateHash(HashState *node)
 	 */
 	for (;;)
 	{
+		bool		isnull;
+		Datum		hashdatum;
+
 		slot = ExecProcNode(outerNode);
 		if (TupIsNull(slot))
 			break;
 		/* We have to compute the hash value */
 		econtext->ecxt_outertuple = slot;
-		if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
-								 false, hashtable->keepNulls,
-								 &hashvalue))
+
+		ResetExprContext(econtext);
+
+		hashdatum = ExecEvalExprSwitchContext(node->hash_expr, econtext,
+											  &isnull);
+
+		if (!isnull)
 		{
+			uint32		hashvalue = DatumGetUInt32(hashdatum);
 			int			bucketNumber;
 
+#if USE_ASSERT_CHECKING
+			/* XXX just for testing. Not for commit */
+			elog(DEBUG1, "hashdatum = " UINT64_FORMAT, (uint64) hashdatum);
+#endif
+
 			bucketNumber = ExecHashGetSkewBucket(hashtable, hashvalue);
 			if (bucketNumber != INVALID_SKEW_BUCKET_NO)
 			{
@@ -215,7 +226,6 @@ MultiExecParallelHash(HashState *node)
 {
 	ParallelHashJoinState *pstate;
 	PlanState  *outerNode;
-	List	   *hashkeys;
 	HashJoinTable hashtable;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
@@ -232,7 +242,6 @@ MultiExecParallelHash(HashState *node)
 	/*
 	 * set expression context
 	 */
-	hashkeys = node->hashkeys;
 	econtext = node->ps.ps_ExprContext;
 
 	/*
@@ -279,13 +288,20 @@ MultiExecParallelHash(HashState *node)
 			ExecParallelHashTableSetCurrentBatch(hashtable, 0);
 			for (;;)
 			{
+				bool		isnull;
+
 				slot = ExecProcNode(outerNode);
 				if (TupIsNull(slot))
 					break;
 				econtext->ecxt_outertuple = slot;
-				if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
-										 false, hashtable->keepNulls,
-										 &hashvalue))
+
+				ResetExprContext(econtext);
+
+				hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(node->hash_expr,
+																	 econtext,
+																	 &isnull));
+
+				if (!isnull)
 					ExecParallelHashTableInsert(hashtable, slot, hashvalue);
 				hashtable->partialTuples++;
 			}
@@ -371,8 +387,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
 	hashstate->ps.plan = (Plan *) node;
 	hashstate->ps.state = estate;
 	hashstate->ps.ExecProcNode = ExecHash;
+	/* delay building hashtable until ExecHashTableCreate() in executor run */
 	hashstate->hashtable = NULL;
-	hashstate->hashkeys = NIL;	/* will be set by parent HashJoin */
 
 	/*
 	 * Miscellaneous initialization
@@ -393,12 +409,15 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
 	ExecInitResultTupleSlotTL(&hashstate->ps, &TTSOpsMinimalTuple);
 	hashstate->ps.ps_ProjInfo = NULL;
 
+	Assert(node->plan.qual == NIL);
+
 	/*
-	 * initialize child expressions
+	 * Delay initialization of hash_expr until ExecInitHashJoin().  We cannot
+	 * build the ExprState here as we don't yet know the join type we're going
+	 * to be hashing values for and we need to know that before calling
+	 * ExecBuildHash32Expr as the keep_nulls parameter depends on the join
+	 * type.
 	 */
-	Assert(node->plan.qual == NIL);
-	hashstate->hashkeys =
-		ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
 
 	return hashstate;
 }
@@ -429,7 +448,7 @@ ExecEndHash(HashState *node)
  * ----------------------------------------------------------------
  */
 HashJoinTable
-ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations, bool keepNulls)
+ExecHashTableCreate(HashState *state)
 {
 	Hash	   *node;
 	HashJoinTable hashtable;
@@ -440,10 +459,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	double		rows;
 	int			num_skew_mcvs;
 	int			log2_nbuckets;
-	int			nkeys;
-	int			i;
-	ListCell   *ho;
-	ListCell   *hc;
 	MemoryContext oldcxt;
 
 	/*
@@ -487,7 +502,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	hashtable->log2_nbuckets = log2_nbuckets;
 	hashtable->log2_nbuckets_optimal = log2_nbuckets;
 	hashtable->buckets.unshared = NULL;
-	hashtable->keepNulls = keepNulls;
 	hashtable->skewEnabled = false;
 	hashtable->skewBucket = NULL;
 	hashtable->skewBucketLen = 0;
@@ -540,32 +554,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 
 	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
 
-	/*
-	 * Get info about the hash functions to be used for each hash key. Also
-	 * remember whether the join operators are strict.
-	 */
-	nkeys = list_length(hashOperators);
-	hashtable->outer_hashfunctions = palloc_array(FmgrInfo, nkeys);
-	hashtable->inner_hashfunctions = palloc_array(FmgrInfo, nkeys);
-	hashtable->hashStrict = palloc_array(bool, nkeys);
-	hashtable->collations = palloc_array(Oid, nkeys);
-	i = 0;
-	forboth(ho, hashOperators, hc, hashCollations)
-	{
-		Oid			hashop = lfirst_oid(ho);
-		Oid			left_hashfn;
-		Oid			right_hashfn;
-
-		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
-			elog(ERROR, "could not find hash function for hash operator %u",
-				 hashop);
-		fmgr_info(left_hashfn, &hashtable->outer_hashfunctions[i]);
-		fmgr_info(right_hashfn, &hashtable->inner_hashfunctions[i]);
-		hashtable->hashStrict[i] = op_strict(hashop);
-		hashtable->collations[i] = lfirst_oid(hc);
-		i++;
-	}
-
 	if (nbatch > 1 && hashtable->parallel_state == NULL)
 	{
 		MemoryContext oldctx;
@@ -652,7 +640,7 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 		 * it.)
 		 */
 		if (nbatch > 1)
-			ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);
+			ExecHashBuildSkewHash(state, hashtable, node, num_skew_mcvs);
 
 		MemoryContextSwitchTo(oldcxt);
 	}
@@ -1803,103 +1791,6 @@ ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable,
 		heap_free_minimal_tuple(tuple);
 }
 
-/*
- * ExecHashGetHashValue
- *		Compute the hash value for a tuple
- *
- * The tuple to be tested must be in econtext->ecxt_outertuple (thus Vars in
- * the hashkeys expressions need to have OUTER_VAR as varno). If outer_tuple
- * is false (meaning it's the HashJoin's inner node, Hash), econtext,
- * hashkeys, and slot need to be from Hash, with hashkeys/slot referencing and
- * being suitable for tuples from the node below the Hash. Conversely, if
- * outer_tuple is true, econtext is from HashJoin, and hashkeys/slot need to
- * be appropriate for tuples from HashJoin's outer node.
- *
- * A true result means the tuple's hash value has been successfully computed
- * and stored at *hashvalue.  A false result means the tuple cannot match
- * because it contains a null attribute, and hence it should be discarded
- * immediately.  (If keep_nulls is true then false is never returned.)
- */
-bool
-ExecHashGetHashValue(HashJoinTable hashtable,
-					 ExprContext *econtext,
-					 List *hashkeys,
-					 bool outer_tuple,
-					 bool keep_nulls,
-					 uint32 *hashvalue)
-{
-	uint32		hashkey = 0;
-	FmgrInfo   *hashfunctions;
-	ListCell   *hk;
-	int			i = 0;
-	MemoryContext oldContext;
-
-	/*
-	 * We reset the eval context each time to reclaim any memory leaked in the
-	 * hashkey expressions.
-	 */
-	ResetExprContext(econtext);
-
-	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-
-	if (outer_tuple)
-		hashfunctions = hashtable->outer_hashfunctions;
-	else
-		hashfunctions = hashtable->inner_hashfunctions;
-
-	foreach(hk, hashkeys)
-	{
-		ExprState  *keyexpr = (ExprState *) lfirst(hk);
-		Datum		keyval;
-		bool		isNull;
-
-		/* combine successive hashkeys by rotating */
-		hashkey = pg_rotate_left32(hashkey, 1);
-
-		/*
-		 * Get the join attribute value of the tuple
-		 */
-		keyval = ExecEvalExpr(keyexpr, econtext, &isNull);
-
-		/*
-		 * If the attribute is NULL, and the join operator is strict, then
-		 * this tuple cannot pass the join qual so we can reject it
-		 * immediately (unless we're scanning the outside of an outer join, in
-		 * which case we must not reject it).  Otherwise we act like the
-		 * hashcode of NULL is zero (this will support operators that act like
-		 * IS NOT DISTINCT, though not any more-random behavior).  We treat
-		 * the hash support function as strict even if the operator is not.
-		 *
-		 * Note: currently, all hashjoinable operators must be strict since
-		 * the hash index AM assumes that.  However, it takes so little extra
-		 * code here to allow non-strict that we may as well do it.
-		 */
-		if (isNull)
-		{
-			if (hashtable->hashStrict[i] && !keep_nulls)
-			{
-				MemoryContextSwitchTo(oldContext);
-				return false;	/* cannot match */
-			}
-			/* else, leave hashkey unmodified, equivalent to hashcode 0 */
-		}
-		else
-		{
-			/* Compute the hash function */
-			uint32		hkey;
-
-			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i], hashtable->collations[i], keyval));
-			hashkey ^= hkey;
-		}
-
-		i++;
-	}
-
-	MemoryContextSwitchTo(oldContext);
-
-	*hashvalue = hashkey;
-	return true;
-}
 
 /*
  * ExecHashGetBucketAndBatch
@@ -2372,7 +2263,8 @@ ExecReScanHash(HashState *node)
  *		based on available memory.
  */
 static void
-ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
+ExecHashBuildSkewHash(HashState *hashstate, HashJoinTable hashtable,
+					  Hash *node, int mcvsToUse)
 {
 	HeapTupleData *statsTuple;
 	AttStatsSlot sslot;
@@ -2400,7 +2292,6 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 	{
 		double		frac;
 		int			nbuckets;
-		FmgrInfo   *hashfunctions;
 		int			i;
 
 		if (mcvsToUse > sslot.nvalues)
@@ -2468,15 +2359,14 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 		 * ExecHashRemoveNextSkewBucket) and we want the least common MCVs to
 		 * be removed first.
 		 */
-		hashfunctions = hashtable->outer_hashfunctions;
 
 		for (i = 0; i < mcvsToUse; i++)
 		{
 			uint32		hashvalue;
 			int			bucket;
 
-			hashvalue = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[0],
-														 hashtable->collations[0],
+			hashvalue = DatumGetUInt32(FunctionCall1Coll(hashstate->skew_hashfunction,
+														 hashstate->skew_collation,
 														 sslot.values[i]));
 
 			/*
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index dbf114cd5e..82b24b503b 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -169,6 +169,7 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "utils/lsyscache.h"
 #include "utils/sharedtuplestore.h"
 #include "utils/wait_event.h"
 
@@ -331,10 +332,7 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
 				 * whoever gets here first will create the hash table and any
 				 * later arrivals will merely attach to it.
 				 */
-				hashtable = ExecHashTableCreate(hashNode,
-												node->hj_HashOperators,
-												node->hj_Collations,
-												HJ_FILL_INNER(node));
+				hashtable = ExecHashTableCreate(hashNode);
 				node->hj_HashTable = hashtable;
 
 				/*
@@ -810,9 +808,95 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	 */
 	{
 		HashState  *hashstate = (HashState *) innerPlanState(hjstate);
+		Hash	   *hash = (Hash *) hashstate->ps.plan;
 		TupleTableSlot *slot = hashstate->ps.ps_ResultTupleSlot;
+		Oid		   *outer_hashfuncid;
+		Oid		   *inner_hashfuncid;
+		bool	   *hash_strict;
+		ListCell   *lc;
+		int			nkeys;
+
 
 		hjstate->hj_HashTupleSlot = slot;
+
+		/*
+		 * Build ExprStates to obtain hash values for either side of the join.
+		 * This must be done here as ExecBuildHash32Expr needs to know how to
+		 * handle NULL inputs and the required handling of that depends on the
+		 * jointype.  We don't know the join type in ExecInitHash() and we
+		 * must build the ExprStates before ExecHashTableCreate() so we
+		 * properly attribute any SubPlans that exist in the hash expressions
+		 * to the correct PlanState.
+		 */
+		nkeys = list_length(node->hashoperators);
+
+		/* XXX worth putting these on the stack when nkeys == 1? */
+		outer_hashfuncid = palloc_array(Oid, nkeys);
+		inner_hashfuncid = palloc_array(Oid, nkeys);
+		hash_strict = palloc_array(bool, nkeys);
+
+		/*
+		 * Determine the hash function for each side of the join for the given
+		 * hash operator.
+		 */
+		foreach(lc, node->hashoperators)
+		{
+			Oid			hashop = lfirst_oid(lc);
+			int			i = foreach_current_index(lc);
+
+			if (!get_op_hash_functions(hashop,
+									   &outer_hashfuncid[i],
+									   &inner_hashfuncid[i]))
+				elog(ERROR,
+					 "could not find hash function for hash operator %u",
+					 hashop);
+			hash_strict[i] = op_strict(hashop);
+		}
+
+		/*
+		 * Build an ExprState to generate the hash value for the expressions
+		 * on the outer of the join.  This ExprState must finish generating
+		 * the hash value when HJ_FILL_OUTER() is true.  Otherwise,
+		 * ExecBuildHash32Expr will set up the ExprState to abort early if it
+		 * finds a NULL.  In these cases, we don't need to store these tuples
+		 * in the hash table as the jointype does not require it.
+		 */
+		hjstate->hj_OuterHash =
+			ExecBuildHash32Expr(hjstate->js.ps.ps_ResultTupleDesc,
+								hjstate->js.ps.resultops,
+								outer_hashfuncid,
+								node->hashcollations,
+								node->hashkeys,
+								hash_strict,
+								&hjstate->js.ps,
+								HJ_FILL_OUTER(hjstate));
+
+		/* As above, but for the inner side of the join */
+		hashstate->hash_expr =
+			ExecBuildHash32Expr(hashstate->ps.ps_ResultTupleDesc,
+								hashstate->ps.resultops,
+								inner_hashfuncid,
+								node->hashcollations,
+								hash->hashkeys,
+								hash_strict,
+								&hashstate->ps,
+								HJ_FILL_INNER(hjstate));
+
+		/*
+		 * Set up the skew table hash function while we have a record of the
+		 * first key's hash function Oid.
+		 */
+		if (OidIsValid(hash->skewTable))
+		{
+			hashstate->skew_hashfunction = palloc0(sizeof(FmgrInfo));
+			hashstate->skew_collation = linitial_oid(node->hashcollations);
+			fmgr_info(outer_hashfuncid[0], hashstate->skew_hashfunction);
+		}
+
+		/* no need to keep these */
+		pfree(outer_hashfuncid);
+		pfree(inner_hashfuncid);
+		pfree(hash_strict);
 	}
 
 	/*
@@ -836,11 +920,6 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	hjstate->hj_CurSkewBucketNo = INVALID_SKEW_BUCKET_NO;
 	hjstate->hj_CurTuple = NULL;
 
-	hjstate->hj_OuterHashKeys = ExecInitExprList(node->hashkeys,
-												 (PlanState *) hjstate);
-	hjstate->hj_HashOperators = node->hashoperators;
-	hjstate->hj_Collations = node->hashcollations;
-
 	hjstate->hj_JoinState = HJ_BUILD_HASHTABLE;
 	hjstate->hj_MatchedOuter = false;
 	hjstate->hj_OuterNotEmpty = false;
@@ -908,17 +987,22 @@ ExecHashJoinOuterGetTuple(PlanState *outerNode,
 
 		while (!TupIsNull(slot))
 		{
+			bool		isnull;
+
 			/*
 			 * We have to compute the tuple's hash value.
 			 */
 			ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
 
 			econtext->ecxt_outertuple = slot;
-			if (ExecHashGetHashValue(hashtable, econtext,
-									 hjstate->hj_OuterHashKeys,
-									 true,	/* outer tuple */
-									 HJ_FILL_OUTER(hjstate),
-									 hashvalue))
+
+			ResetExprContext(econtext);
+
+			*hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+																  econtext,
+																  &isnull));
+
+			if (!isnull)
 			{
 				/* remember outer relation is not empty for possible rescan */
 				hjstate->hj_OuterNotEmpty = true;
@@ -979,14 +1063,19 @@ ExecParallelHashJoinOuterGetTuple(PlanState *outerNode,
 
 		while (!TupIsNull(slot))
 		{
+			bool		isnull;
+
 			ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
 
 			econtext->ecxt_outertuple = slot;
-			if (ExecHashGetHashValue(hashtable, econtext,
-									 hjstate->hj_OuterHashKeys,
-									 true,	/* outer tuple */
-									 HJ_FILL_OUTER(hjstate),
-									 hashvalue))
+
+			ResetExprContext(econtext);
+
+			*hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+																  econtext,
+																  &isnull));
+
+			if (!isnull)
 				return slot;
 
 			/*
@@ -1508,15 +1597,20 @@ ExecParallelHashJoinPartitionOuter(HashJoinState *hjstate)
 	/* Execute outer plan, writing all tuples to shared tuplestores. */
 	for (;;)
 	{
+		bool		isnull;
+
 		slot = ExecProcNode(outerState);
 		if (TupIsNull(slot))
 			break;
 		econtext->ecxt_outertuple = slot;
-		if (ExecHashGetHashValue(hashtable, econtext,
-								 hjstate->hj_OuterHashKeys,
-								 true,	/* outer tuple */
-								 HJ_FILL_OUTER(hjstate),
-								 &hashvalue))
+
+		ResetExprContext(econtext);
+
+		hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+															 econtext,
+															 &isnull));
+
+		if (!isnull)
 		{
 			int			batchno;
 			int			bucketno;
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 9e0efd2668..39b8a8ebde 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -1912,6 +1912,133 @@ llvm_compile_expr(ExprState *state)
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
+			case EEOP_HASHDATUM_FIRST:
+			case EEOP_HASHDATUM_FIRST_STRICT:
+			case EEOP_HASHDATUM_NEXT32:
+			case EEOP_HASHDATUM_NEXT32_STRICT:
+				{
+					FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+					LLVMValueRef v_fcinfo_isnull;
+					LLVMValueRef v_retval;
+					LLVMBasicBlockRef b_checkargnull;
+					LLVMValueRef v_fcinfo;
+					LLVMBasicBlockRef b_ifnotnull;
+					LLVMBasicBlockRef b_ifnullblock;
+					LLVMValueRef v_argisnull;
+					LLVMValueRef v_prevhash = NULL;
+
+					/*
+					 * Block for the actual function call, if args are
+					 * non-NULL.
+					 */
+					b_ifnotnull = l_bb_before_v(opblocks[opno + 1],
+												"b.%d.ifnotnull",
+												opno);
+
+					/* we expect the hash function to have 1 argument */
+					if (fcinfo->nargs != 1)
+						elog(ERROR, "incorrect number of function arguments");
+
+					v_fcinfo = l_ptr_const(fcinfo,
+										   l_ptr(StructFunctionCallInfoData));
+
+					b_checkargnull = l_bb_before_v(b_ifnotnull,
+												   "b.%d.isnull.0", opno);
+
+					/* jump to checking if the arg is NULL */
+					LLVMBuildBr(b, b_checkargnull);
+
+					/*
+					 * Determine what to do if we find the argument to be
+					 * NULL.  When processing a strict opcode we set resnull
+					 * to true and goto jumpdone.  Otherwise, goto the next
+					 * opblock.
+					 */
+					if (opcode == EEOP_HASHDATUM_FIRST_STRICT ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						b_ifnullblock = l_bb_before_v(b_ifnotnull,
+													  "b.%d.strictnull",
+													  opno);
+
+						LLVMPositionBuilderAtEnd(b, b_ifnullblock);
+
+						/* set resnull to true and jumpdone */
+						LLVMBuildStore(b, l_sbool_const(1), v_resnullp);
+						LLVMBuildStore(b, l_sizet_const(0), v_resvaluep);
+						LLVMBuildBr(b, opblocks[op->d.hashdatum.jumpdone]);
+					}
+					else
+						b_ifnullblock = opblocks[opno + 1];
+
+					LLVMPositionBuilderAtEnd(b, b_checkargnull);
+
+					/* check if the input parameter is NULL */
+					v_argisnull = l_funcnull(b, v_fcinfo, 0);
+					LLVMBuildCondBr(b,
+									LLVMBuildICmp(b,
+												  LLVMIntEQ,
+												  v_argisnull,
+												  l_sbool_const(1),
+												  ""),
+									b_ifnullblock,
+									b_ifnotnull);
+
+					LLVMPositionBuilderAtEnd(b, b_ifnotnull);
+
+					/*
+					 * If not hashing the first value then we must load the
+					 * previous hash value and rotate it left one bit.
+					 */
+					if (opcode == EEOP_HASHDATUM_NEXT32 ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						LLVMValueRef v_tmp1;
+						LLVMValueRef v_tmp2;
+
+						/*
+						 * Fetch the previously hashed value from where the
+						 * EEOP_HASHDATUM_FIRST* operation stored it.
+						 */
+						v_prevhash = l_load(b, TypeSizeT, v_resvaluep, "");
+
+						/*
+						 * Rotate bits left by 1 bit.  Be careful not to
+						 * overflow uint32 when working with size_t.
+						 */
+						v_tmp1 = LLVMBuildShl(b, v_prevhash, l_sizet_const(1),
+											  "");
+						v_tmp1 = LLVMBuildAnd(b, v_tmp1,
+											  l_sizet_const(0xffffffff), "");
+						v_tmp2 = LLVMBuildLShr(b, v_prevhash,
+											   l_sizet_const(31), "");
+						v_prevhash = LLVMBuildOr(b, v_tmp1, v_tmp2, "");
+					}
+
+					/* call the hash function */
+					v_retval = BuildV1Call(context,
+										   b,
+										   mod,
+										   fcinfo,
+										   &v_fcinfo_isnull);
+
+					if (opcode == EEOP_HASHDATUM_NEXT32 ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						/*
+						 * Perform XOR (^) operation to combine the previous
+						 * and the just calculated hash value.
+						 */
+						v_retval = LLVMBuildXor(b, v_prevhash, v_retval, "");
+					}
+
+					LLVMBuildStore(b, v_retval, v_resvaluep);
+					LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+
+					LLVMBuildBr(b, opblocks[opno + 1]);
+					break;
+				}
+
 			case EEOP_XMLEXPR:
 				build_EvalXFunc(b, mod, "ExecEvalXmlExpr",
 								v_state, op);
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 64698202a5..0cce79a392 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -237,6 +237,10 @@ typedef enum ExprEvalOp
 	EEOP_CONVERT_ROWTYPE,
 	EEOP_SCALARARRAYOP,
 	EEOP_HASHED_SCALARARRAYOP,
+	EEOP_HASHDATUM_FIRST,
+	EEOP_HASHDATUM_FIRST_STRICT,
+	EEOP_HASHDATUM_NEXT32,
+	EEOP_HASHDATUM_NEXT32_STRICT,
 	EEOP_XMLEXPR,
 	EEOP_JSON_CONSTRUCTOR,
 	EEOP_IS_JSON,
@@ -593,6 +597,16 @@ typedef struct ExprEvalStep
 			ScalarArrayOpExpr *saop;
 		}			hashedscalararrayop;
 
+		/* for EEOP_HASHDATUM_(FIRST|NEXT32)[_STRICT] */
+		struct
+		{
+			FmgrInfo   *finfo;	/* function's lookup data */
+			FunctionCallInfo fcinfo_data;	/* arguments etc */
+			/* faster to access without additional indirection: */
+			PGFunction	fn_addr;	/* actual call address */
+			int			jumpdone;	/* jump here on null */
+		}			hashdatum;
+
 		/* for EEOP_XMLEXPR */
 		struct
 		{
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9770752ea3..8fd3f06a55 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -285,6 +285,13 @@ extern ExprState *ExecInitCheck(List *qual, PlanState *parent);
 extern List *ExecInitExprList(List *nodes, PlanState *parent);
 extern ExprState *ExecBuildAggTrans(AggState *aggstate, struct AggStatePerPhaseData *phase,
 									bool doSort, bool doHash, bool nullcheck);
+extern ExprState *ExecBuildHash32Expr(TupleDesc desc,
+									  const TupleTableSlotOps *ops,
+									  const Oid *hashfunc_oids,
+									  const List *collations,
+									  const List *hash_exprs,
+									  const bool *opstrict,
+									  PlanState *parent, bool keep_nulls);
 extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const TupleTableSlotOps *lops, const TupleTableSlotOps *rops,
 										 int numCols,
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 9197846cda..2d8ed8688c 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -313,8 +313,6 @@ typedef struct HashJoinTableData
 		dsa_pointer_atomic *shared;
 	}			buckets;
 
-	bool		keepNulls;		/* true to store unmatchable NULL tuples */
-
 	bool		skewEnabled;	/* are we using skew optimization? */
 	HashSkewBucket **skewBucket;	/* hashtable of skew buckets */
 	int			skewBucketLen;	/* size of skewBucket array (a power of 2!) */
@@ -343,16 +341,6 @@ typedef struct HashJoinTableData
 	BufFile   **innerBatchFile; /* buffered virtual temp file per batch */
 	BufFile   **outerBatchFile; /* buffered virtual temp file per batch */
 
-	/*
-	 * Info about the datatype-specific hash functions for the datatypes being
-	 * hashed. These are arrays of the same length as the number of hash join
-	 * clauses (hash keys).
-	 */
-	FmgrInfo   *outer_hashfunctions;	/* lookup data for hash functions */
-	FmgrInfo   *inner_hashfunctions;	/* lookup data for hash functions */
-	bool	   *hashStrict;		/* is each hash join operator strict? */
-	Oid		   *collations;
-
 	Size		spaceUsed;		/* memory space currently used by tuples */
 	Size		spaceAllowed;	/* upper limit for space used */
 	Size		spacePeak;		/* peak space used */
diff --git a/src/include/executor/nodeHash.h b/src/include/executor/nodeHash.h
index a95911c2fe..28dd2229f8 100644
--- a/src/include/executor/nodeHash.h
+++ b/src/include/executor/nodeHash.h
@@ -24,8 +24,7 @@ extern Node *MultiExecHash(HashState *node);
 extern void ExecEndHash(HashState *node);
 extern void ExecReScanHash(HashState *node);
 
-extern HashJoinTable ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
-										 bool keepNulls);
+extern HashJoinTable ExecHashTableCreate(HashState *state);
 extern void ExecParallelHashTableAlloc(HashJoinTable hashtable,
 									   int batchno);
 extern void ExecHashTableDestroy(HashJoinTable hashtable);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 8bc421e7c0..059779f238 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2157,8 +2157,7 @@ typedef struct MergeJoinState
  *	 HashJoinState information
  *
  *		hashclauses				original form of the hashjoin condition
- *		hj_OuterHashKeys		the outer hash keys in the hashjoin condition
- *		hj_HashOperators		the join operators in the hashjoin condition
+ *		hj_OuterHash			ExprState for hashing outer keys
  *		hj_HashTable			hash table for the hashjoin
  *								(NULL if table not built yet)
  *		hj_CurHashValue			hash value for current outer tuple
@@ -2188,9 +2187,7 @@ typedef struct HashJoinState
 {
 	JoinState	js;				/* its first field is NodeTag */
 	ExprState  *hashclauses;
-	List	   *hj_OuterHashKeys;	/* list of ExprState nodes */
-	List	   *hj_HashOperators;	/* list of operator OIDs */
-	List	   *hj_Collations;
+	ExprState  *hj_OuterHash;
 	HashJoinTable hj_HashTable;
 	uint32		hj_CurHashValue;
 	int			hj_CurBucketNo;
@@ -2743,7 +2740,10 @@ typedef struct HashState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	HashJoinTable hashtable;	/* hash table for the hashjoin */
-	List	   *hashkeys;		/* list of ExprState nodes */
+	ExprState  *hash_expr;		/* ExprState to get hash value */
+
+	FmgrInfo   *skew_hashfunction;	/* lookup data for skew hash function */
+	Oid			skew_collation; /* collation to call skew_hashfunction with */
 
 	/*
 	 * In a parallelized hash join, the leader retains a pointer to the
-- 
2.34.1

#2David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#1)
1 attachment(s)
Re: Speed up Hash Join by teaching ExprState about hashing

On Mon, 13 May 2024 at 21:23, David Rowley <dgrowleyml@gmail.com> wrote:

In master, if you look at ExecHashGetHashValue() in nodeHash.c, you
can see that it calls ExecEvalExpr() and then manually calls the hash
function on the returned value. This process is repeated once for each
hash key. This is inefficient for a few reasons:

1) ExecEvalExpr() will only deform tuples up the max varattno that's
mentioned in the hash key. That means we might have to deform
attributes in multiple steps, once for each hash key.
2) ExecHashGetHashValue() is very branchy and checks if hashStrict[]
and keep_nulls on every loop. There's also a branch to check which
hash functions to use.
3) foreach isn't exactly the pinnacle of efficiency either.

All of the above points can be improved by making ExprState handle
hashing. This means we'll deform all attributes that are needed for
hashing once, rather than incrementally once per key. This also allows
JIT compilation of hashing ExprStates, which will make things even
faster.

The attached patch implements this. Here are some performance numbers.

I've been doing a bit more work on this to start to add support for
faster hashing for hashing needs other than Hash Join. In the
attached, I've added support to give the hash value an initial value.
Support for that is required to allow Hash Aggregate to work. If you
look at what's being done now inside BuildTupleHashTableExt(), you'll
see that "hash_iv" exists there to allow an initial hash value. This
seems to be getting used to allow some variation in hash values
calculated inside parallel workers, per hashtable->hash_iv =
murmurhash32(ParallelWorkerNumber). One of my aims for this patch is
to always produce the same hash value before and after the patch, so
I've gone and implemented the equivalent functionality which can be
enabled or disabled as required depending on the use case.

I've not added support for Hash Aggregate quite yet. I did look at
doing that, but it seems to need quite a bit of refactoring to do it
nicely. The problem is that BuildTupleHashTableExt() receives
keyColIdx with the attribute numbers to hash. The new
ExecBuildHash32Expr() function requires a List of Exprs. It looks
like the keyColIdx array comes directly from the planner which is many
layers up and would need lots of code churn of function signatures to
change. While I could form Vars using the keyColIdx array to populate
the required List of Exprs, I so far can't decide where exactly that
should happen. I think probably the planner should form the Expr List.
It seems a bit strange to be doing makeVar() in the executor.

I currently think that it's fine to speed up Hash Join as phase one
for this patch. I can work more on improving hash value generation in
other locations later.

I'd be happy if someone else were to give this patch a review and
test. One part I struggled a bit with was finding a way to cast the
Size variable down to uint32 in LLVM. I tried to add a new supported
type for uint32 but just couldn't get it to work. Instead, I did:

v_tmp1 = LLVMBuildAnd(b, v_tmp1,
l_sizet_const(0xffffffff), "");

which works and I imagine compiled to the same code as a cast. It
just looks a bit strange.

David

Attachments:

v2-0001-Speed-up-Hash-Join-by-making-ExprStates-hash.patchapplication/octet-stream; name=v2-0001-Speed-up-Hash-Join-by-making-ExprStates-hash.patchDownload
From 2b8c715bc42c5b8cbdceefb1ef046567e84b18e7 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Fri, 10 May 2024 20:22:07 +1200
Subject: [PATCH v2] Speed up Hash Join by making ExprStates hash

---
 src/backend/executor/execExpr.c       | 139 ++++++++++++++++++
 src/backend/executor/execExprInterp.c | 102 ++++++++++++++
 src/backend/executor/nodeHash.c       | 194 ++++++--------------------
 src/backend/executor/nodeHashjoin.c   | 144 +++++++++++++++----
 src/backend/jit/llvm/llvmjit_expr.c   | 139 ++++++++++++++++++
 src/include/executor/execExpr.h       |  24 ++++
 src/include/executor/executor.h       |   7 +
 src/include/executor/hashjoin.h       |  12 --
 src/include/executor/nodeHash.h       |   3 +-
 src/include/nodes/execnodes.h         |  12 +-
 10 files changed, 580 insertions(+), 196 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index ccd4863778..d35aa4155c 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3931,6 +3931,145 @@ ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 	}
 }
 
+/*
+ * Build an ExprState that calls the given hash function(s) on the given
+ * 'hash_exprs'.  When multiple expressions are present, the hash values
+ * returned by each function are combined to produce a single hash value.
+ *
+ * desc: tuple descriptor to extract 'hash_exprs' from
+ * ops: TupleTableSlotOps for the TupleDesc
+ * hashfunc_oids: Oid for each hash function to call, one for each 'hash_expr'
+ * collations: collation to use when calling the hash function.
+ * hash_expr: list of expressions to hash the value of
+ * opstrict: array corresponding to the 'hashfunc_oids' to store op_strict()
+ * parent: PlanState node that the 'hash_exprs' will be evaluated at
+ * init_value: Normally 0, but can be set to other values to seed the hash
+ * with some other value.  Using non-zero is slightly less efficient but can
+ * be useful.
+ * keep_nulls: if true, evaluation of the returned ExprState will abort early
+ * returning NULL if the given hash function is strict and the Datum to hash
+ * is null.  When set to false, any NULL input Datums are skipped and the
+ * resulting hash value is unchanged, i.e. it's a hash of all non-null
+ * 'hash_exprs', or 0 if all 'hash_exprs' evaluate to NULL Datums.
+ */
+ExprState *
+ExecBuildHash32Expr(TupleDesc desc, const TupleTableSlotOps *ops,
+					const Oid *hashfunc_oids, const List *collations,
+					const List *hash_exprs, const bool *opstrict,
+					PlanState *parent, uint32 init_value, bool keep_nulls)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	intptr_t	strict_opcode;
+	intptr_t	opcode;
+
+	Assert(list_length(hash_exprs) == list_length(collations));
+
+	state->parent = parent;
+
+	/* Insert setup steps as needed */
+	ExecCreateExprSetupSteps(state, (Node *) hash_exprs);
+
+	if (init_value == 0)
+	{
+		/*
+		 * No initial value, so we can assign the result of the hash function
+		 * for the first hash_expr without having to concern ourselves with
+		 * combining the result with any initial value.
+		 */
+		strict_opcode = EEOP_HASHDATUM_FIRST_STRICT;
+		opcode = EEOP_HASHDATUM_FIRST;
+	}
+	else
+	{
+		/* set up operation to set the initial value */
+		scratch.opcode = EEOP_HASHDATUM_SET_INITVAL;
+		scratch.d.hashdatum_initvalue.init_value = UInt32GetDatum(init_value);
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+
+		ExprEvalPushStep(state, &scratch);
+
+		/*
+		 * When using an initial value, use the NEXT32 ops as the FIRST* ops
+		 * would overwrite the stored initial value.
+		 */
+		strict_opcode = EEOP_HASHDATUM_NEXT32_STRICT;
+		opcode = EEOP_HASHDATUM_NEXT32;
+	}
+
+	forboth(lc, hash_exprs, lc2, collations)
+	{
+		Expr	   *expr = (Expr *) lfirst(lc);
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		int			i = foreach_current_index(lc);
+		Oid			funcid;
+		Oid			inputcollid = lfirst_oid(lc2);
+
+		funcid = hashfunc_oids[i];
+
+		/* Allocate hash function lookup data  */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(1));
+
+		fmgr_info(funcid, finfo);
+
+		/*
+		 * Build the steps to evaluate the hash function's argument have it so
+		 * the value of that is stored in the 0th argument of the hash func.
+		 */
+		ExecInitExprRec(expr,
+						state,
+						&fcinfo->args[0].value,
+						&fcinfo->args[0].isnull);
+
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+
+		/* Initialize function call parameter structure too */
+		InitFunctionCallInfoData(*fcinfo, finfo, 1, inputcollid, NULL, NULL);
+
+		scratch.d.hashdatum.finfo = finfo;
+		scratch.d.hashdatum.fcinfo_data = fcinfo;
+		scratch.d.hashdatum.fn_addr = finfo->fn_addr;
+
+		scratch.opcode = opstrict[i] && !keep_nulls ? strict_opcode : opcode;
+		scratch.d.hashdatum.jumpdone = -1;
+
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps, state->steps_len - 1);
+
+		/*
+		 * For subsequent keys we must combine the hash value with the
+		 * previous hashes.
+		 */
+		strict_opcode = EEOP_HASHDATUM_NEXT32_STRICT;
+		opcode = EEOP_HASHDATUM_NEXT32;
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->d.hashdatum.jumpdone == -1);
+		as->d.hashdatum.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
+
 /*
  * Build equality expression that can be evaluated using ExecQual(), returning
  * true if the expression context's inner/outer tuple are NOT DISTINCT. I.e
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index d8735286c4..866cc79ad5 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -476,6 +476,11 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&&CASE_EEOP_DOMAIN_TESTVAL,
 		&&CASE_EEOP_DOMAIN_NOTNULL,
 		&&CASE_EEOP_DOMAIN_CHECK,
+		&&CASE_EEOP_HASHDATUM_SET_INITVAL,
+		&&CASE_EEOP_HASHDATUM_FIRST,
+		&&CASE_EEOP_HASHDATUM_FIRST_STRICT,
+		&&CASE_EEOP_HASHDATUM_NEXT32,
+		&&CASE_EEOP_HASHDATUM_NEXT32_STRICT,
 		&&CASE_EEOP_CONVERT_ROWTYPE,
 		&&CASE_EEOP_SCALARARRAYOP,
 		&&CASE_EEOP_HASHED_SCALARARRAYOP,
@@ -1535,6 +1540,103 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 			EEO_NEXT();
 		}
 
+		EEO_CASE(EEOP_HASHDATUM_SET_INITVAL)
+		{
+			*op->resvalue = op->d.hashdatum_initvalue.init_value;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_FIRST)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			/* leave the hash value alone on NULL inputs */
+			if (!fcinfo->args[0].isnull)
+			{
+				/* execute the hash function and save the resulting value */
+				*op->resvalue = op->d.hashdatum.fn_addr(fcinfo);
+				*op->resnull = false;
+			}
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_FIRST_STRICT)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			if (fcinfo->args[0].isnull)
+			{
+				/*
+				 * With strict we have the expression return NULL instead of
+				 * ignoring NULL input values.  We've nothing more to do after
+				 * finding a NULL.
+				 */
+				*op->resnull = true;
+				*op->resvalue = (Datum) 0;
+				EEO_JUMP(op->d.hashdatum.jumpdone);
+			}
+
+			/* execute the hash function and save the resulting value */
+			*op->resvalue = op->d.hashdatum.fn_addr(fcinfo);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_NEXT32)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			/* leave the hash value alone on NULL inputs */
+			if (!fcinfo->args[0].isnull)
+			{
+				uint32		existing_hash = DatumGetUInt32(*op->resvalue);
+				uint32		hashvalue;
+
+				/* combine successive hash values by rotating */
+				existing_hash = pg_rotate_left32(existing_hash, 1);
+
+				/* execute hash func and combine with previous hash value */
+				hashvalue = DatumGetUInt32(op->d.hashdatum.fn_addr(fcinfo));
+				*op->resvalue = UInt32GetDatum(existing_hash ^ hashvalue);
+				*op->resnull = false;
+			}
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_NEXT32_STRICT)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+			uint32		existing_hash;
+			uint32		hashvalue;
+
+			if (fcinfo->args[0].isnull)
+			{
+				/*
+				 * With strict we have the expression return NULL instead of
+				 * ignoring NULL input values.  We've nothing more to do after
+				 * finding a NULL.
+				 */
+				*op->resnull = true;
+				*op->resvalue = (Datum) 0;
+				EEO_JUMP(op->d.hashdatum.jumpdone);
+			}
+
+			existing_hash = DatumGetUInt32(*op->resvalue);
+			/* combine successive hash values by rotating */
+			existing_hash = pg_rotate_left32(existing_hash, 1);
+
+			/* execute hash func and combine with previous hash value */
+			hashvalue = DatumGetUInt32(op->d.hashdatum.fn_addr(fcinfo));
+			*op->resvalue = UInt32GetDatum(existing_hash ^ hashvalue);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
 		EEO_CASE(EEOP_XMLEXPR)
 		{
 			/* too complex for an inline implementation */
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 61480733a1..ea27bc5ca4 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -47,7 +47,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
-static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
+static void ExecHashBuildSkewHash(HashState *hashstate,
+								  HashJoinTable hashtable, Hash *node,
 								  int mcvsToUse);
 static void ExecHashSkewTableInsert(HashJoinTable hashtable,
 									TupleTableSlot *slot,
@@ -138,11 +139,9 @@ static void
 MultiExecPrivateHash(HashState *node)
 {
 	PlanState  *outerNode;
-	List	   *hashkeys;
 	HashJoinTable hashtable;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
-	uint32		hashvalue;
 
 	/*
 	 * get state info from node
@@ -153,7 +152,6 @@ MultiExecPrivateHash(HashState *node)
 	/*
 	 * set expression context
 	 */
-	hashkeys = node->hashkeys;
 	econtext = node->ps.ps_ExprContext;
 
 	/*
@@ -162,17 +160,30 @@ MultiExecPrivateHash(HashState *node)
 	 */
 	for (;;)
 	{
+		bool		isnull;
+		Datum		hashdatum;
+
 		slot = ExecProcNode(outerNode);
 		if (TupIsNull(slot))
 			break;
 		/* We have to compute the hash value */
 		econtext->ecxt_outertuple = slot;
-		if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
-								 false, hashtable->keepNulls,
-								 &hashvalue))
+
+		ResetExprContext(econtext);
+
+		hashdatum = ExecEvalExprSwitchContext(node->hash_expr, econtext,
+											  &isnull);
+
+		if (!isnull)
 		{
+			uint32		hashvalue = DatumGetUInt32(hashdatum);
 			int			bucketNumber;
 
+#if USE_ASSERT_CHECKING
+			/* XXX just for testing. Not for commit */
+			elog(DEBUG1, "hashdatum = " UINT64_FORMAT, (uint64) hashdatum);
+#endif
+
 			bucketNumber = ExecHashGetSkewBucket(hashtable, hashvalue);
 			if (bucketNumber != INVALID_SKEW_BUCKET_NO)
 			{
@@ -215,7 +226,6 @@ MultiExecParallelHash(HashState *node)
 {
 	ParallelHashJoinState *pstate;
 	PlanState  *outerNode;
-	List	   *hashkeys;
 	HashJoinTable hashtable;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
@@ -232,7 +242,6 @@ MultiExecParallelHash(HashState *node)
 	/*
 	 * set expression context
 	 */
-	hashkeys = node->hashkeys;
 	econtext = node->ps.ps_ExprContext;
 
 	/*
@@ -279,13 +288,20 @@ MultiExecParallelHash(HashState *node)
 			ExecParallelHashTableSetCurrentBatch(hashtable, 0);
 			for (;;)
 			{
+				bool		isnull;
+
 				slot = ExecProcNode(outerNode);
 				if (TupIsNull(slot))
 					break;
 				econtext->ecxt_outertuple = slot;
-				if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
-										 false, hashtable->keepNulls,
-										 &hashvalue))
+
+				ResetExprContext(econtext);
+
+				hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(node->hash_expr,
+																	 econtext,
+																	 &isnull));
+
+				if (!isnull)
 					ExecParallelHashTableInsert(hashtable, slot, hashvalue);
 				hashtable->partialTuples++;
 			}
@@ -371,8 +387,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
 	hashstate->ps.plan = (Plan *) node;
 	hashstate->ps.state = estate;
 	hashstate->ps.ExecProcNode = ExecHash;
+	/* delay building hashtable until ExecHashTableCreate() in executor run */
 	hashstate->hashtable = NULL;
-	hashstate->hashkeys = NIL;	/* will be set by parent HashJoin */
 
 	/*
 	 * Miscellaneous initialization
@@ -393,12 +409,15 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
 	ExecInitResultTupleSlotTL(&hashstate->ps, &TTSOpsMinimalTuple);
 	hashstate->ps.ps_ProjInfo = NULL;
 
+	Assert(node->plan.qual == NIL);
+
 	/*
-	 * initialize child expressions
+	 * Delay initialization of hash_expr until ExecInitHashJoin().  We cannot
+	 * build the ExprState here as we don't yet know the join type we're going
+	 * to be hashing values for and we need to know that before calling
+	 * ExecBuildHash32Expr as the keep_nulls parameter depends on the join
+	 * type.
 	 */
-	Assert(node->plan.qual == NIL);
-	hashstate->hashkeys =
-		ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
 
 	return hashstate;
 }
@@ -429,7 +448,7 @@ ExecEndHash(HashState *node)
  * ----------------------------------------------------------------
  */
 HashJoinTable
-ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations, bool keepNulls)
+ExecHashTableCreate(HashState *state)
 {
 	Hash	   *node;
 	HashJoinTable hashtable;
@@ -440,10 +459,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	double		rows;
 	int			num_skew_mcvs;
 	int			log2_nbuckets;
-	int			nkeys;
-	int			i;
-	ListCell   *ho;
-	ListCell   *hc;
 	MemoryContext oldcxt;
 
 	/*
@@ -487,7 +502,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	hashtable->log2_nbuckets = log2_nbuckets;
 	hashtable->log2_nbuckets_optimal = log2_nbuckets;
 	hashtable->buckets.unshared = NULL;
-	hashtable->keepNulls = keepNulls;
 	hashtable->skewEnabled = false;
 	hashtable->skewBucket = NULL;
 	hashtable->skewBucketLen = 0;
@@ -540,32 +554,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 
 	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
 
-	/*
-	 * Get info about the hash functions to be used for each hash key. Also
-	 * remember whether the join operators are strict.
-	 */
-	nkeys = list_length(hashOperators);
-	hashtable->outer_hashfunctions = palloc_array(FmgrInfo, nkeys);
-	hashtable->inner_hashfunctions = palloc_array(FmgrInfo, nkeys);
-	hashtable->hashStrict = palloc_array(bool, nkeys);
-	hashtable->collations = palloc_array(Oid, nkeys);
-	i = 0;
-	forboth(ho, hashOperators, hc, hashCollations)
-	{
-		Oid			hashop = lfirst_oid(ho);
-		Oid			left_hashfn;
-		Oid			right_hashfn;
-
-		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
-			elog(ERROR, "could not find hash function for hash operator %u",
-				 hashop);
-		fmgr_info(left_hashfn, &hashtable->outer_hashfunctions[i]);
-		fmgr_info(right_hashfn, &hashtable->inner_hashfunctions[i]);
-		hashtable->hashStrict[i] = op_strict(hashop);
-		hashtable->collations[i] = lfirst_oid(hc);
-		i++;
-	}
-
 	if (nbatch > 1 && hashtable->parallel_state == NULL)
 	{
 		MemoryContext oldctx;
@@ -652,7 +640,7 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 		 * it.)
 		 */
 		if (nbatch > 1)
-			ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);
+			ExecHashBuildSkewHash(state, hashtable, node, num_skew_mcvs);
 
 		MemoryContextSwitchTo(oldcxt);
 	}
@@ -1803,103 +1791,6 @@ ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable,
 		heap_free_minimal_tuple(tuple);
 }
 
-/*
- * ExecHashGetHashValue
- *		Compute the hash value for a tuple
- *
- * The tuple to be tested must be in econtext->ecxt_outertuple (thus Vars in
- * the hashkeys expressions need to have OUTER_VAR as varno). If outer_tuple
- * is false (meaning it's the HashJoin's inner node, Hash), econtext,
- * hashkeys, and slot need to be from Hash, with hashkeys/slot referencing and
- * being suitable for tuples from the node below the Hash. Conversely, if
- * outer_tuple is true, econtext is from HashJoin, and hashkeys/slot need to
- * be appropriate for tuples from HashJoin's outer node.
- *
- * A true result means the tuple's hash value has been successfully computed
- * and stored at *hashvalue.  A false result means the tuple cannot match
- * because it contains a null attribute, and hence it should be discarded
- * immediately.  (If keep_nulls is true then false is never returned.)
- */
-bool
-ExecHashGetHashValue(HashJoinTable hashtable,
-					 ExprContext *econtext,
-					 List *hashkeys,
-					 bool outer_tuple,
-					 bool keep_nulls,
-					 uint32 *hashvalue)
-{
-	uint32		hashkey = 0;
-	FmgrInfo   *hashfunctions;
-	ListCell   *hk;
-	int			i = 0;
-	MemoryContext oldContext;
-
-	/*
-	 * We reset the eval context each time to reclaim any memory leaked in the
-	 * hashkey expressions.
-	 */
-	ResetExprContext(econtext);
-
-	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-
-	if (outer_tuple)
-		hashfunctions = hashtable->outer_hashfunctions;
-	else
-		hashfunctions = hashtable->inner_hashfunctions;
-
-	foreach(hk, hashkeys)
-	{
-		ExprState  *keyexpr = (ExprState *) lfirst(hk);
-		Datum		keyval;
-		bool		isNull;
-
-		/* combine successive hashkeys by rotating */
-		hashkey = pg_rotate_left32(hashkey, 1);
-
-		/*
-		 * Get the join attribute value of the tuple
-		 */
-		keyval = ExecEvalExpr(keyexpr, econtext, &isNull);
-
-		/*
-		 * If the attribute is NULL, and the join operator is strict, then
-		 * this tuple cannot pass the join qual so we can reject it
-		 * immediately (unless we're scanning the outside of an outer join, in
-		 * which case we must not reject it).  Otherwise we act like the
-		 * hashcode of NULL is zero (this will support operators that act like
-		 * IS NOT DISTINCT, though not any more-random behavior).  We treat
-		 * the hash support function as strict even if the operator is not.
-		 *
-		 * Note: currently, all hashjoinable operators must be strict since
-		 * the hash index AM assumes that.  However, it takes so little extra
-		 * code here to allow non-strict that we may as well do it.
-		 */
-		if (isNull)
-		{
-			if (hashtable->hashStrict[i] && !keep_nulls)
-			{
-				MemoryContextSwitchTo(oldContext);
-				return false;	/* cannot match */
-			}
-			/* else, leave hashkey unmodified, equivalent to hashcode 0 */
-		}
-		else
-		{
-			/* Compute the hash function */
-			uint32		hkey;
-
-			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i], hashtable->collations[i], keyval));
-			hashkey ^= hkey;
-		}
-
-		i++;
-	}
-
-	MemoryContextSwitchTo(oldContext);
-
-	*hashvalue = hashkey;
-	return true;
-}
 
 /*
  * ExecHashGetBucketAndBatch
@@ -2372,7 +2263,8 @@ ExecReScanHash(HashState *node)
  *		based on available memory.
  */
 static void
-ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
+ExecHashBuildSkewHash(HashState *hashstate, HashJoinTable hashtable,
+					  Hash *node, int mcvsToUse)
 {
 	HeapTupleData *statsTuple;
 	AttStatsSlot sslot;
@@ -2400,7 +2292,6 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 	{
 		double		frac;
 		int			nbuckets;
-		FmgrInfo   *hashfunctions;
 		int			i;
 
 		if (mcvsToUse > sslot.nvalues)
@@ -2468,15 +2359,14 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 		 * ExecHashRemoveNextSkewBucket) and we want the least common MCVs to
 		 * be removed first.
 		 */
-		hashfunctions = hashtable->outer_hashfunctions;
 
 		for (i = 0; i < mcvsToUse; i++)
 		{
 			uint32		hashvalue;
 			int			bucket;
 
-			hashvalue = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[0],
-														 hashtable->collations[0],
+			hashvalue = DatumGetUInt32(FunctionCall1Coll(hashstate->skew_hashfunction,
+														 hashstate->skew_collation,
 														 sslot.values[i]));
 
 			/*
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5429e68734..40d7436ddc 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -169,6 +169,7 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "utils/lsyscache.h"
 #include "utils/sharedtuplestore.h"
 #include "utils/wait_event.h"
 
@@ -331,10 +332,7 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
 				 * whoever gets here first will create the hash table and any
 				 * later arrivals will merely attach to it.
 				 */
-				hashtable = ExecHashTableCreate(hashNode,
-												node->hj_HashOperators,
-												node->hj_Collations,
-												HJ_FILL_INNER(node));
+				hashtable = ExecHashTableCreate(hashNode);
 				node->hj_HashTable = hashtable;
 
 				/*
@@ -820,9 +818,97 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	 */
 	{
 		HashState  *hashstate = (HashState *) innerPlanState(hjstate);
+		Hash	   *hash = (Hash *) hashstate->ps.plan;
 		TupleTableSlot *slot = hashstate->ps.ps_ResultTupleSlot;
+		Oid		   *outer_hashfuncid;
+		Oid		   *inner_hashfuncid;
+		bool	   *hash_strict;
+		ListCell   *lc;
+		int			nkeys;
+
 
 		hjstate->hj_HashTupleSlot = slot;
+
+		/*
+		 * Build ExprStates to obtain hash values for either side of the join.
+		 * This must be done here as ExecBuildHash32Expr needs to know how to
+		 * handle NULL inputs and the required handling of that depends on the
+		 * jointype.  We don't know the join type in ExecInitHash() and we
+		 * must build the ExprStates before ExecHashTableCreate() so we
+		 * properly attribute any SubPlans that exist in the hash expressions
+		 * to the correct PlanState.
+		 */
+		nkeys = list_length(node->hashoperators);
+
+		/* XXX worth putting these on the stack when nkeys == 1? */
+		outer_hashfuncid = palloc_array(Oid, nkeys);
+		inner_hashfuncid = palloc_array(Oid, nkeys);
+		hash_strict = palloc_array(bool, nkeys);
+
+		/*
+		 * Determine the hash function for each side of the join for the given
+		 * hash operator.
+		 */
+		foreach(lc, node->hashoperators)
+		{
+			Oid			hashop = lfirst_oid(lc);
+			int			i = foreach_current_index(lc);
+
+			if (!get_op_hash_functions(hashop,
+									   &outer_hashfuncid[i],
+									   &inner_hashfuncid[i]))
+				elog(ERROR,
+					 "could not find hash function for hash operator %u",
+					 hashop);
+			hash_strict[i] = op_strict(hashop);
+		}
+
+		/*
+		 * Build an ExprState to generate the hash value for the expressions
+		 * on the outer of the join.  This ExprState must finish generating
+		 * the hash value when HJ_FILL_OUTER() is true.  Otherwise,
+		 * ExecBuildHash32Expr will set up the ExprState to abort early if it
+		 * finds a NULL.  In these cases, we don't need to store these tuples
+		 * in the hash table as the jointype does not require it.
+		 */
+		hjstate->hj_OuterHash =
+			ExecBuildHash32Expr(hjstate->js.ps.ps_ResultTupleDesc,
+								hjstate->js.ps.resultops,
+								outer_hashfuncid,
+								node->hashcollations,
+								node->hashkeys,
+								hash_strict,
+								&hjstate->js.ps,
+								0,
+								HJ_FILL_OUTER(hjstate));
+
+		/* As above, but for the inner side of the join */
+		hashstate->hash_expr =
+			ExecBuildHash32Expr(hashstate->ps.ps_ResultTupleDesc,
+								hashstate->ps.resultops,
+								inner_hashfuncid,
+								node->hashcollations,
+								hash->hashkeys,
+								hash_strict,
+								&hashstate->ps,
+								0,
+								HJ_FILL_INNER(hjstate));
+
+		/*
+		 * Set up the skew table hash function while we have a record of the
+		 * first key's hash function Oid.
+		 */
+		if (OidIsValid(hash->skewTable))
+		{
+			hashstate->skew_hashfunction = palloc0(sizeof(FmgrInfo));
+			hashstate->skew_collation = linitial_oid(node->hashcollations);
+			fmgr_info(outer_hashfuncid[0], hashstate->skew_hashfunction);
+		}
+
+		/* no need to keep these */
+		pfree(outer_hashfuncid);
+		pfree(inner_hashfuncid);
+		pfree(hash_strict);
 	}
 
 	/*
@@ -846,11 +932,6 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	hjstate->hj_CurSkewBucketNo = INVALID_SKEW_BUCKET_NO;
 	hjstate->hj_CurTuple = NULL;
 
-	hjstate->hj_OuterHashKeys = ExecInitExprList(node->hashkeys,
-												 (PlanState *) hjstate);
-	hjstate->hj_HashOperators = node->hashoperators;
-	hjstate->hj_Collations = node->hashcollations;
-
 	hjstate->hj_JoinState = HJ_BUILD_HASHTABLE;
 	hjstate->hj_MatchedOuter = false;
 	hjstate->hj_OuterNotEmpty = false;
@@ -918,17 +999,22 @@ ExecHashJoinOuterGetTuple(PlanState *outerNode,
 
 		while (!TupIsNull(slot))
 		{
+			bool		isnull;
+
 			/*
 			 * We have to compute the tuple's hash value.
 			 */
 			ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
 
 			econtext->ecxt_outertuple = slot;
-			if (ExecHashGetHashValue(hashtable, econtext,
-									 hjstate->hj_OuterHashKeys,
-									 true,	/* outer tuple */
-									 HJ_FILL_OUTER(hjstate),
-									 hashvalue))
+
+			ResetExprContext(econtext);
+
+			*hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+																  econtext,
+																  &isnull));
+
+			if (!isnull)
 			{
 				/* remember outer relation is not empty for possible rescan */
 				hjstate->hj_OuterNotEmpty = true;
@@ -989,14 +1075,19 @@ ExecParallelHashJoinOuterGetTuple(PlanState *outerNode,
 
 		while (!TupIsNull(slot))
 		{
+			bool		isnull;
+
 			ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
 
 			econtext->ecxt_outertuple = slot;
-			if (ExecHashGetHashValue(hashtable, econtext,
-									 hjstate->hj_OuterHashKeys,
-									 true,	/* outer tuple */
-									 HJ_FILL_OUTER(hjstate),
-									 hashvalue))
+
+			ResetExprContext(econtext);
+
+			*hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+																  econtext,
+																  &isnull));
+
+			if (!isnull)
 				return slot;
 
 			/*
@@ -1518,15 +1609,20 @@ ExecParallelHashJoinPartitionOuter(HashJoinState *hjstate)
 	/* Execute outer plan, writing all tuples to shared tuplestores. */
 	for (;;)
 	{
+		bool		isnull;
+
 		slot = ExecProcNode(outerState);
 		if (TupIsNull(slot))
 			break;
 		econtext->ecxt_outertuple = slot;
-		if (ExecHashGetHashValue(hashtable, econtext,
-								 hjstate->hj_OuterHashKeys,
-								 true,	/* outer tuple */
-								 HJ_FILL_OUTER(hjstate),
-								 &hashvalue))
+
+		ResetExprContext(econtext);
+
+		hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+															 econtext,
+															 &isnull));
+
+		if (!isnull)
 		{
 			int			batchno;
 			int			bucketno;
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 306aea82d3..e14cf167e0 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -1894,6 +1894,145 @@ llvm_compile_expr(ExprState *state)
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
+			case EEOP_HASHDATUM_SET_INITVAL:
+				{
+					LLVMValueRef v_initvalue;
+
+					v_initvalue = l_sizet_const(op->d.hashdatum_initvalue.init_value);
+
+					LLVMBuildStore(b, v_initvalue, v_resvaluep);
+					LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+					LLVMBuildBr(b, opblocks[opno + 1]);
+					break;
+				}
+
+			case EEOP_HASHDATUM_FIRST:
+			case EEOP_HASHDATUM_FIRST_STRICT:
+			case EEOP_HASHDATUM_NEXT32:
+			case EEOP_HASHDATUM_NEXT32_STRICT:
+				{
+					FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+					LLVMValueRef v_fcinfo_isnull;
+					LLVMValueRef v_retval;
+					LLVMBasicBlockRef b_checkargnull;
+					LLVMValueRef v_fcinfo;
+					LLVMBasicBlockRef b_ifnotnull;
+					LLVMBasicBlockRef b_ifnullblock;
+					LLVMValueRef v_argisnull;
+					LLVMValueRef v_prevhash = NULL;
+
+					/*
+					 * Block for the actual function call, if args are
+					 * non-NULL.
+					 */
+					b_ifnotnull = l_bb_before_v(opblocks[opno + 1],
+												"b.%d.ifnotnull",
+												opno);
+
+					/* we expect the hash function to have 1 argument */
+					if (fcinfo->nargs != 1)
+						elog(ERROR, "incorrect number of function arguments");
+
+					v_fcinfo = l_ptr_const(fcinfo,
+										   l_ptr(StructFunctionCallInfoData));
+
+					b_checkargnull = l_bb_before_v(b_ifnotnull,
+												   "b.%d.isnull.0", opno);
+
+					/* jump to checking if the arg is NULL */
+					LLVMBuildBr(b, b_checkargnull);
+
+					/*
+					 * Determine what to do if we find the argument to be
+					 * NULL.  When processing a strict opcode we set resnull
+					 * to true and goto jumpdone.  Otherwise, goto the next
+					 * opblock.
+					 */
+					if (opcode == EEOP_HASHDATUM_FIRST_STRICT ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						b_ifnullblock = l_bb_before_v(b_ifnotnull,
+													  "b.%d.strictnull",
+													  opno);
+
+						LLVMPositionBuilderAtEnd(b, b_ifnullblock);
+
+						/* set resnull to true and jumpdone */
+						LLVMBuildStore(b, l_sbool_const(1), v_resnullp);
+						LLVMBuildStore(b, l_sizet_const(0), v_resvaluep);
+						LLVMBuildBr(b, opblocks[op->d.hashdatum.jumpdone]);
+					}
+					else
+						b_ifnullblock = opblocks[opno + 1];
+
+					LLVMPositionBuilderAtEnd(b, b_checkargnull);
+
+					/* check if the input parameter is NULL */
+					v_argisnull = l_funcnull(b, v_fcinfo, 0);
+					LLVMBuildCondBr(b,
+									LLVMBuildICmp(b,
+												  LLVMIntEQ,
+												  v_argisnull,
+												  l_sbool_const(1),
+												  ""),
+									b_ifnullblock,
+									b_ifnotnull);
+
+					LLVMPositionBuilderAtEnd(b, b_ifnotnull);
+
+					/*
+					 * If not hashing the first value then we must load the
+					 * previous hash value and rotate it left one bit.
+					 */
+					if (opcode == EEOP_HASHDATUM_NEXT32 ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						LLVMValueRef v_tmp1;
+						LLVMValueRef v_tmp2;
+
+						/*
+						 * Fetch the previously hashed value from where the
+						 * EEOP_HASHDATUM_FIRST* operation stored it.
+						 */
+						v_prevhash = l_load(b, TypeSizeT, v_resvaluep, "");
+
+						/*
+						 * Rotate bits left by 1 bit.  Be careful not to
+						 * overflow uint32 when working with size_t.
+						 */
+						v_tmp1 = LLVMBuildShl(b, v_prevhash, l_sizet_const(1),
+											  "");
+						v_tmp1 = LLVMBuildAnd(b, v_tmp1,
+											  l_sizet_const(0xffffffff), "");
+						v_tmp2 = LLVMBuildLShr(b, v_prevhash,
+											   l_sizet_const(31), "");
+						v_prevhash = LLVMBuildOr(b, v_tmp1, v_tmp2, "");
+					}
+
+					/* call the hash function */
+					v_retval = BuildV1Call(context,
+										   b,
+										   mod,
+										   fcinfo,
+										   &v_fcinfo_isnull);
+
+					if (opcode == EEOP_HASHDATUM_NEXT32 ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						/*
+						 * Perform XOR (^) operation to combine the previous
+						 * and the just calculated hash value.
+						 */
+						v_retval = LLVMBuildXor(b, v_prevhash, v_retval, "");
+					}
+
+					LLVMBuildStore(b, v_retval, v_resvaluep);
+					LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+
+					LLVMBuildBr(b, opblocks[opno + 1]);
+					break;
+				}
+
 			case EEOP_CONVERT_ROWTYPE:
 				build_EvalXFunc(b, mod, "ExecEvalConvertRowtype",
 								v_state, op, v_econtext);
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 55337d4916..a88d2406f5 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -233,6 +233,13 @@ typedef enum ExprEvalOp
 	/* evaluate a single domain CHECK constraint */
 	EEOP_DOMAIN_CHECK,
 
+	/* evaluation steps for hashing */
+	EEOP_HASHDATUM_SET_INITVAL,
+	EEOP_HASHDATUM_FIRST,
+	EEOP_HASHDATUM_FIRST_STRICT,
+	EEOP_HASHDATUM_NEXT32,
+	EEOP_HASHDATUM_NEXT32_STRICT,
+
 	/* evaluate assorted special-purpose expression types */
 	EEOP_CONVERT_ROWTYPE,
 	EEOP_SCALARARRAYOP,
@@ -556,6 +563,23 @@ typedef struct ExprEvalStep
 			ErrorSaveContext *escontext;
 		}			domaincheck;
 
+		/* for EEOP_HASH_SET_INITVAL */
+		struct
+		{
+			Datum		init_value;
+
+		}			hashdatum_initvalue;
+
+		/* for EEOP_HASHDATUM_(FIRST|NEXT32)[_STRICT] */
+		struct
+		{
+			FmgrInfo   *finfo;	/* function's lookup data */
+			FunctionCallInfo fcinfo_data;	/* arguments etc */
+			/* faster to access without additional indirection: */
+			PGFunction	fn_addr;	/* actual call address */
+			int			jumpdone;	/* jump here on null */
+		}			hashdatum;
+
 		/* for EEOP_CONVERT_ROWTYPE */
 		struct
 		{
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9770752ea3..046a7fb69b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -285,6 +285,13 @@ extern ExprState *ExecInitCheck(List *qual, PlanState *parent);
 extern List *ExecInitExprList(List *nodes, PlanState *parent);
 extern ExprState *ExecBuildAggTrans(AggState *aggstate, struct AggStatePerPhaseData *phase,
 									bool doSort, bool doHash, bool nullcheck);
+extern ExprState *ExecBuildHash32Expr(TupleDesc desc,
+									  const TupleTableSlotOps *ops,
+									  const Oid *hashfunc_oids,
+									  const List *collations,
+									  const List *hash_exprs,
+									  const bool *opstrict, PlanState *parent,
+									  uint32 init_value, bool keep_nulls);
 extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const TupleTableSlotOps *lops, const TupleTableSlotOps *rops,
 										 int numCols,
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 9197846cda..2d8ed8688c 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -313,8 +313,6 @@ typedef struct HashJoinTableData
 		dsa_pointer_atomic *shared;
 	}			buckets;
 
-	bool		keepNulls;		/* true to store unmatchable NULL tuples */
-
 	bool		skewEnabled;	/* are we using skew optimization? */
 	HashSkewBucket **skewBucket;	/* hashtable of skew buckets */
 	int			skewBucketLen;	/* size of skewBucket array (a power of 2!) */
@@ -343,16 +341,6 @@ typedef struct HashJoinTableData
 	BufFile   **innerBatchFile; /* buffered virtual temp file per batch */
 	BufFile   **outerBatchFile; /* buffered virtual temp file per batch */
 
-	/*
-	 * Info about the datatype-specific hash functions for the datatypes being
-	 * hashed. These are arrays of the same length as the number of hash join
-	 * clauses (hash keys).
-	 */
-	FmgrInfo   *outer_hashfunctions;	/* lookup data for hash functions */
-	FmgrInfo   *inner_hashfunctions;	/* lookup data for hash functions */
-	bool	   *hashStrict;		/* is each hash join operator strict? */
-	Oid		   *collations;
-
 	Size		spaceUsed;		/* memory space currently used by tuples */
 	Size		spaceAllowed;	/* upper limit for space used */
 	Size		spacePeak;		/* peak space used */
diff --git a/src/include/executor/nodeHash.h b/src/include/executor/nodeHash.h
index a95911c2fe..28dd2229f8 100644
--- a/src/include/executor/nodeHash.h
+++ b/src/include/executor/nodeHash.h
@@ -24,8 +24,7 @@ extern Node *MultiExecHash(HashState *node);
 extern void ExecEndHash(HashState *node);
 extern void ExecReScanHash(HashState *node);
 
-extern HashJoinTable ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
-										 bool keepNulls);
+extern HashJoinTable ExecHashTableCreate(HashState *state);
 extern void ExecParallelHashTableAlloc(HashJoinTable hashtable,
 									   int batchno);
 extern void ExecHashTableDestroy(HashJoinTable hashtable);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cac684d9b3..7b03f641d4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2183,8 +2183,7 @@ typedef struct MergeJoinState
  *	 HashJoinState information
  *
  *		hashclauses				original form of the hashjoin condition
- *		hj_OuterHashKeys		the outer hash keys in the hashjoin condition
- *		hj_HashOperators		the join operators in the hashjoin condition
+ *		hj_OuterHash			ExprState for hashing outer keys
  *		hj_HashTable			hash table for the hashjoin
  *								(NULL if table not built yet)
  *		hj_CurHashValue			hash value for current outer tuple
@@ -2214,9 +2213,7 @@ typedef struct HashJoinState
 {
 	JoinState	js;				/* its first field is NodeTag */
 	ExprState  *hashclauses;
-	List	   *hj_OuterHashKeys;	/* list of ExprState nodes */
-	List	   *hj_HashOperators;	/* list of operator OIDs */
-	List	   *hj_Collations;
+	ExprState  *hj_OuterHash;
 	HashJoinTable hj_HashTable;
 	uint32		hj_CurHashValue;
 	int			hj_CurBucketNo;
@@ -2769,7 +2766,10 @@ typedef struct HashState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	HashJoinTable hashtable;	/* hash table for the hashjoin */
-	List	   *hashkeys;		/* list of ExprState nodes */
+	ExprState  *hash_expr;		/* ExprState to get hash value */
+
+	FmgrInfo   *skew_hashfunction;	/* lookup data for skew hash function */
+	Oid			skew_collation; /* collation to call skew_hashfunction with */
 
 	/*
 	 * In a parallelized hash join, the leader retains a pointer to the
-- 
2.34.1

#3David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#1)
3 attachment(s)
Re: Speed up Hash Join by teaching ExprState about hashing

On Sun, 11 Aug 2024 at 22:09, Alexey Dvoichenkov <alexey@hyperplane.net> wrote:

I like the idea so I started looking at this patch. I ran some tests,
the query is an aggregation over a join of two tables with 5M rows,
where "columns" is the number of join conditions. (Mostly the same as
in your test.) The numbers are the average query run-time in seconds.

Thanks for running those tests.

I wondered if the hash table has 5M items that the non-predictable
memory access pattern when probing that table might be drowning out
some of the gains of producing hash values faster. I wrote the
attached script which creates a fairly small table but probes that
table much more than once per hash value. I tried to do that in a way
that didn't read or process lots of shared buffers so as not to put
additional pressure on the CPU caches, which could evict cache lines
of the hash table. I am seeing much larger performance gains from
that test. Up to 26% faster. Please see the attached .png file for the
results. I've also attached the script I used to get those results.
This time I tried 1-6 join columns and also included the test results
for jit=off, jit=on, jit optimize, jit inline for each of the 6
queries. You can see that with 5 and 6 columns that jit inline was
26% faster than master, but just 14% faster with 1 column. The
smallest improvement was with 1 col with jit=on at just 7% faster.

- ExecHashGetHashValue, and
- TupleHashTableHash_internal

.. currently rotate the initial and previous hash values regardless of
the NULL check. So the rotation should probably be placed before the
NULL check in NEXT states if you want to preserve the existing
behavior.

That's my mistake. I think originally I didn't see the sense in
rotating, but you're right. I think not doing that would have (1,
NULL) and (NULL, 1) hash to the same value. Maybe that's ok, but I
think it's much better not to take the risk and keep the behaviour the
same as master. The attached v3 patch does that. I've left the
client_min_messages=debug1 output in the patch for now. I checked the
hash values match with master using a FULL OUTER JOIN with a 3-column
join using 1000 random INTs, 10% of them NULL.

David

Attachments:

hashjoin_bench.sh.txttext/plain; charset=US-ASCII; name=hashjoin_bench.sh.txtDownload
10million_probes_on_1k_hashtab.pngimage/png; name=10million_probes_on_1k_hashtab.pngDownload
v3-0001-Speed-up-Hash-Join-by-making-ExprStates-hash.patchapplication/octet-stream; name=v3-0001-Speed-up-Hash-Join-by-making-ExprStates-hash.patchDownload
From ca1dfc93b08ebb82c96864ee17a79b9b5b1a008c Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Fri, 10 May 2024 20:22:07 +1200
Subject: [PATCH v3] Speed up Hash Join by making ExprStates hash

---
 src/backend/executor/execExpr.c       | 143 +++++++++++++++++++
 src/backend/executor/execExprInterp.c | 103 ++++++++++++++
 src/backend/executor/nodeHash.c       | 194 ++++++--------------------
 src/backend/executor/nodeHashjoin.c   | 144 +++++++++++++++----
 src/backend/jit/llvm/llvmjit_expr.c   | 159 +++++++++++++++++++++
 src/include/executor/execExpr.h       |  24 ++++
 src/include/executor/executor.h       |   7 +
 src/include/executor/hashjoin.h       |  12 --
 src/include/executor/nodeHash.h       |   3 +-
 src/include/nodes/execnodes.h         |  12 +-
 10 files changed, 605 insertions(+), 196 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 66dda8e5e6..e751f224b8 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3969,6 +3969,149 @@ ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 	}
 }
 
+/*
+ * Build an ExprState that calls the given hash function(s) on the given
+ * 'hash_exprs'.  When multiple expressions are present, the hash values
+ * returned by each function are combined to produce a single hash value.
+ *
+ * desc: tuple descriptor to extract 'hash_exprs' from
+ * ops: TupleTableSlotOps for the TupleDesc
+ * hashfunc_oids: Oid for each hash function to call, one for each 'hash_expr'
+ * collations: collation to use when calling the hash function.
+ * hash_expr: list of expressions to hash the value of
+ * opstrict: array corresponding to the 'hashfunc_oids' to store op_strict()
+ * parent: PlanState node that the 'hash_exprs' will be evaluated at
+ * init_value: Normally 0, but can be set to other values to seed the hash
+ * with some other value.  Using non-zero is slightly less efficient but can
+ * be useful.
+ * keep_nulls: if true, evaluation of the returned ExprState will abort early
+ * returning NULL if the given hash function is strict and the Datum to hash
+ * is null.  When set to false, any NULL input Datums are skipped and the
+ * resulting hash value is unchanged, i.e. it's a hash of all non-null
+ * 'hash_exprs', or 0 if all 'hash_exprs' evaluate to NULL Datums.
+ */
+ExprState *
+ExecBuildHash32Expr(TupleDesc desc, const TupleTableSlotOps *ops,
+					const Oid *hashfunc_oids, const List *collations,
+					const List *hash_exprs, const bool *opstrict,
+					PlanState *parent, uint32 init_value, bool keep_nulls)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	intptr_t	strict_opcode;
+	intptr_t	opcode;
+
+	Assert(list_length(hash_exprs) == list_length(collations));
+
+	state->parent = parent;
+
+	/* Insert setup steps as needed */
+	ExecCreateExprSetupSteps(state, (Node *) hash_exprs);
+
+	if (init_value == 0)
+	{
+		/*
+		 * No initial value, so we can assign the result of the hash function
+		 * for the first hash_expr without having to concern ourselves with
+		 * combining the result with any initial value.
+		 */
+		strict_opcode = EEOP_HASHDATUM_FIRST_STRICT;
+		opcode = EEOP_HASHDATUM_FIRST;
+	}
+	else
+	{
+		/* set up operation to set the initial value */
+		scratch.opcode = EEOP_HASHDATUM_SET_INITVAL;
+		scratch.d.hashdatum_initvalue.init_value = UInt32GetDatum(init_value);
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+
+		ExprEvalPushStep(state, &scratch);
+
+		/*
+		 * When using an initial value, use the NEXT32 ops as the FIRST* ops
+		 * would overwrite the stored initial value.
+		 */
+		strict_opcode = EEOP_HASHDATUM_NEXT32_STRICT;
+		opcode = EEOP_HASHDATUM_NEXT32;
+	}
+
+	forboth(lc, hash_exprs, lc2, collations)
+	{
+		Expr	   *expr = (Expr *) lfirst(lc);
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		int			i = foreach_current_index(lc);
+		Oid			funcid;
+		Oid			inputcollid = lfirst_oid(lc2);
+
+		funcid = hashfunc_oids[i];
+
+		/* Allocate hash function lookup data  */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(1));
+
+		fmgr_info(funcid, finfo);
+
+		/*
+		 * Build the steps to evaluate the hash function's argument have it so
+		 * the value of that is stored in the 0th argument of the hash func.
+		 */
+		ExecInitExprRec(expr,
+						state,
+						&fcinfo->args[0].value,
+						&fcinfo->args[0].isnull);
+
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+
+		/* Initialize function call parameter structure too */
+		InitFunctionCallInfoData(*fcinfo, finfo, 1, inputcollid, NULL, NULL);
+
+		scratch.d.hashdatum.finfo = finfo;
+		scratch.d.hashdatum.fcinfo_data = fcinfo;
+		scratch.d.hashdatum.fn_addr = finfo->fn_addr;
+
+		scratch.opcode = opstrict[i] && !keep_nulls ? strict_opcode : opcode;
+		scratch.d.hashdatum.jumpdone = -1;
+
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps, state->steps_len - 1);
+
+		/*
+		 * For subsequent keys we must combine the hash value with the
+		 * previous hashes.
+		 */
+		strict_opcode = EEOP_HASHDATUM_NEXT32_STRICT;
+		opcode = EEOP_HASHDATUM_NEXT32;
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_HASHDATUM_FIRST ||
+			   as->opcode == EEOP_HASHDATUM_FIRST_STRICT ||
+			   as->opcode == EEOP_HASHDATUM_NEXT32 ||
+			   as->opcode == EEOP_HASHDATUM_NEXT32_STRICT);
+		Assert(as->d.hashdatum.jumpdone == -1);
+		as->d.hashdatum.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
+
 /*
  * Build equality expression that can be evaluated using ExecQual(), returning
  * true if the expression context's inner/outer tuple are NOT DISTINCT. I.e
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ea47c4d6f9..46d29c61fa 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -477,6 +477,11 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&&CASE_EEOP_DOMAIN_TESTVAL,
 		&&CASE_EEOP_DOMAIN_NOTNULL,
 		&&CASE_EEOP_DOMAIN_CHECK,
+		&&CASE_EEOP_HASHDATUM_SET_INITVAL,
+		&&CASE_EEOP_HASHDATUM_FIRST,
+		&&CASE_EEOP_HASHDATUM_FIRST_STRICT,
+		&&CASE_EEOP_HASHDATUM_NEXT32,
+		&&CASE_EEOP_HASHDATUM_NEXT32_STRICT,
 		&&CASE_EEOP_CONVERT_ROWTYPE,
 		&&CASE_EEOP_SCALARARRAYOP,
 		&&CASE_EEOP_HASHED_SCALARARRAYOP,
@@ -1543,6 +1548,104 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 			EEO_NEXT();
 		}
 
+		EEO_CASE(EEOP_HASHDATUM_SET_INITVAL)
+		{
+			*op->resvalue = op->d.hashdatum_initvalue.init_value;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_FIRST)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			/* leave the hash value alone on NULL inputs */
+			if (!fcinfo->args[0].isnull)
+			{
+				/* execute the hash function and save the resulting value */
+				*op->resvalue = op->d.hashdatum.fn_addr(fcinfo);
+				*op->resnull = false;
+			}
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_FIRST_STRICT)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			if (fcinfo->args[0].isnull)
+			{
+				/*
+				 * With strict we have the expression return NULL instead of
+				 * ignoring NULL input values.  We've nothing more to do after
+				 * finding a NULL.
+				 */
+				*op->resnull = true;
+				*op->resvalue = (Datum) 0;
+				EEO_JUMP(op->d.hashdatum.jumpdone);
+			}
+
+			/* execute the hash function and save the resulting value */
+			*op->resvalue = op->d.hashdatum.fn_addr(fcinfo);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_NEXT32)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+			uint32		existing_hash = DatumGetUInt32(*op->resvalue);
+
+			/* combine successive hash values by rotating */
+			existing_hash = pg_rotate_left32(existing_hash, 1);
+
+			/* leave the hash value alone on NULL inputs */
+			if (!fcinfo->args[0].isnull)
+			{
+				uint32		hashvalue;
+
+				/* execute hash func and combine with previous hash value */
+				hashvalue = DatumGetUInt32(op->d.hashdatum.fn_addr(fcinfo));
+				existing_hash = existing_hash ^ hashvalue;
+			}
+
+			*op->resvalue = UInt32GetDatum(existing_hash);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_NEXT32_STRICT)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+			uint32		existing_hash = DatumGetUInt32(*op->resvalue);
+			uint32		hashvalue;
+
+			/* combine successive hash values by rotating */
+			existing_hash = pg_rotate_left32(existing_hash, 1);
+
+			if (fcinfo->args[0].isnull)
+			{
+				/*
+				 * With strict we have the expression return NULL instead of
+				 * ignoring NULL input values.  We've nothing more to do after
+				 * finding a NULL.
+				 */
+				*op->resnull = true;
+				*op->resvalue = (Datum) 0;
+				EEO_JUMP(op->d.hashdatum.jumpdone);
+			}
+
+			/* execute hash func and combine with previous hash value */
+			hashvalue = DatumGetUInt32(op->d.hashdatum.fn_addr(fcinfo));
+			*op->resvalue = UInt32GetDatum(existing_hash ^ hashvalue);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
 		EEO_CASE(EEOP_XMLEXPR)
 		{
 			/* too complex for an inline implementation */
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 61480733a1..ea27bc5ca4 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -47,7 +47,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
-static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
+static void ExecHashBuildSkewHash(HashState *hashstate,
+								  HashJoinTable hashtable, Hash *node,
 								  int mcvsToUse);
 static void ExecHashSkewTableInsert(HashJoinTable hashtable,
 									TupleTableSlot *slot,
@@ -138,11 +139,9 @@ static void
 MultiExecPrivateHash(HashState *node)
 {
 	PlanState  *outerNode;
-	List	   *hashkeys;
 	HashJoinTable hashtable;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
-	uint32		hashvalue;
 
 	/*
 	 * get state info from node
@@ -153,7 +152,6 @@ MultiExecPrivateHash(HashState *node)
 	/*
 	 * set expression context
 	 */
-	hashkeys = node->hashkeys;
 	econtext = node->ps.ps_ExprContext;
 
 	/*
@@ -162,17 +160,30 @@ MultiExecPrivateHash(HashState *node)
 	 */
 	for (;;)
 	{
+		bool		isnull;
+		Datum		hashdatum;
+
 		slot = ExecProcNode(outerNode);
 		if (TupIsNull(slot))
 			break;
 		/* We have to compute the hash value */
 		econtext->ecxt_outertuple = slot;
-		if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
-								 false, hashtable->keepNulls,
-								 &hashvalue))
+
+		ResetExprContext(econtext);
+
+		hashdatum = ExecEvalExprSwitchContext(node->hash_expr, econtext,
+											  &isnull);
+
+		if (!isnull)
 		{
+			uint32		hashvalue = DatumGetUInt32(hashdatum);
 			int			bucketNumber;
 
+#if USE_ASSERT_CHECKING
+			/* XXX just for testing. Not for commit */
+			elog(DEBUG1, "hashdatum = " UINT64_FORMAT, (uint64) hashdatum);
+#endif
+
 			bucketNumber = ExecHashGetSkewBucket(hashtable, hashvalue);
 			if (bucketNumber != INVALID_SKEW_BUCKET_NO)
 			{
@@ -215,7 +226,6 @@ MultiExecParallelHash(HashState *node)
 {
 	ParallelHashJoinState *pstate;
 	PlanState  *outerNode;
-	List	   *hashkeys;
 	HashJoinTable hashtable;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
@@ -232,7 +242,6 @@ MultiExecParallelHash(HashState *node)
 	/*
 	 * set expression context
 	 */
-	hashkeys = node->hashkeys;
 	econtext = node->ps.ps_ExprContext;
 
 	/*
@@ -279,13 +288,20 @@ MultiExecParallelHash(HashState *node)
 			ExecParallelHashTableSetCurrentBatch(hashtable, 0);
 			for (;;)
 			{
+				bool		isnull;
+
 				slot = ExecProcNode(outerNode);
 				if (TupIsNull(slot))
 					break;
 				econtext->ecxt_outertuple = slot;
-				if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
-										 false, hashtable->keepNulls,
-										 &hashvalue))
+
+				ResetExprContext(econtext);
+
+				hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(node->hash_expr,
+																	 econtext,
+																	 &isnull));
+
+				if (!isnull)
 					ExecParallelHashTableInsert(hashtable, slot, hashvalue);
 				hashtable->partialTuples++;
 			}
@@ -371,8 +387,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
 	hashstate->ps.plan = (Plan *) node;
 	hashstate->ps.state = estate;
 	hashstate->ps.ExecProcNode = ExecHash;
+	/* delay building hashtable until ExecHashTableCreate() in executor run */
 	hashstate->hashtable = NULL;
-	hashstate->hashkeys = NIL;	/* will be set by parent HashJoin */
 
 	/*
 	 * Miscellaneous initialization
@@ -393,12 +409,15 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
 	ExecInitResultTupleSlotTL(&hashstate->ps, &TTSOpsMinimalTuple);
 	hashstate->ps.ps_ProjInfo = NULL;
 
+	Assert(node->plan.qual == NIL);
+
 	/*
-	 * initialize child expressions
+	 * Delay initialization of hash_expr until ExecInitHashJoin().  We cannot
+	 * build the ExprState here as we don't yet know the join type we're going
+	 * to be hashing values for and we need to know that before calling
+	 * ExecBuildHash32Expr as the keep_nulls parameter depends on the join
+	 * type.
 	 */
-	Assert(node->plan.qual == NIL);
-	hashstate->hashkeys =
-		ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
 
 	return hashstate;
 }
@@ -429,7 +448,7 @@ ExecEndHash(HashState *node)
  * ----------------------------------------------------------------
  */
 HashJoinTable
-ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations, bool keepNulls)
+ExecHashTableCreate(HashState *state)
 {
 	Hash	   *node;
 	HashJoinTable hashtable;
@@ -440,10 +459,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	double		rows;
 	int			num_skew_mcvs;
 	int			log2_nbuckets;
-	int			nkeys;
-	int			i;
-	ListCell   *ho;
-	ListCell   *hc;
 	MemoryContext oldcxt;
 
 	/*
@@ -487,7 +502,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	hashtable->log2_nbuckets = log2_nbuckets;
 	hashtable->log2_nbuckets_optimal = log2_nbuckets;
 	hashtable->buckets.unshared = NULL;
-	hashtable->keepNulls = keepNulls;
 	hashtable->skewEnabled = false;
 	hashtable->skewBucket = NULL;
 	hashtable->skewBucketLen = 0;
@@ -540,32 +554,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 
 	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
 
-	/*
-	 * Get info about the hash functions to be used for each hash key. Also
-	 * remember whether the join operators are strict.
-	 */
-	nkeys = list_length(hashOperators);
-	hashtable->outer_hashfunctions = palloc_array(FmgrInfo, nkeys);
-	hashtable->inner_hashfunctions = palloc_array(FmgrInfo, nkeys);
-	hashtable->hashStrict = palloc_array(bool, nkeys);
-	hashtable->collations = palloc_array(Oid, nkeys);
-	i = 0;
-	forboth(ho, hashOperators, hc, hashCollations)
-	{
-		Oid			hashop = lfirst_oid(ho);
-		Oid			left_hashfn;
-		Oid			right_hashfn;
-
-		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
-			elog(ERROR, "could not find hash function for hash operator %u",
-				 hashop);
-		fmgr_info(left_hashfn, &hashtable->outer_hashfunctions[i]);
-		fmgr_info(right_hashfn, &hashtable->inner_hashfunctions[i]);
-		hashtable->hashStrict[i] = op_strict(hashop);
-		hashtable->collations[i] = lfirst_oid(hc);
-		i++;
-	}
-
 	if (nbatch > 1 && hashtable->parallel_state == NULL)
 	{
 		MemoryContext oldctx;
@@ -652,7 +640,7 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 		 * it.)
 		 */
 		if (nbatch > 1)
-			ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);
+			ExecHashBuildSkewHash(state, hashtable, node, num_skew_mcvs);
 
 		MemoryContextSwitchTo(oldcxt);
 	}
@@ -1803,103 +1791,6 @@ ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable,
 		heap_free_minimal_tuple(tuple);
 }
 
-/*
- * ExecHashGetHashValue
- *		Compute the hash value for a tuple
- *
- * The tuple to be tested must be in econtext->ecxt_outertuple (thus Vars in
- * the hashkeys expressions need to have OUTER_VAR as varno). If outer_tuple
- * is false (meaning it's the HashJoin's inner node, Hash), econtext,
- * hashkeys, and slot need to be from Hash, with hashkeys/slot referencing and
- * being suitable for tuples from the node below the Hash. Conversely, if
- * outer_tuple is true, econtext is from HashJoin, and hashkeys/slot need to
- * be appropriate for tuples from HashJoin's outer node.
- *
- * A true result means the tuple's hash value has been successfully computed
- * and stored at *hashvalue.  A false result means the tuple cannot match
- * because it contains a null attribute, and hence it should be discarded
- * immediately.  (If keep_nulls is true then false is never returned.)
- */
-bool
-ExecHashGetHashValue(HashJoinTable hashtable,
-					 ExprContext *econtext,
-					 List *hashkeys,
-					 bool outer_tuple,
-					 bool keep_nulls,
-					 uint32 *hashvalue)
-{
-	uint32		hashkey = 0;
-	FmgrInfo   *hashfunctions;
-	ListCell   *hk;
-	int			i = 0;
-	MemoryContext oldContext;
-
-	/*
-	 * We reset the eval context each time to reclaim any memory leaked in the
-	 * hashkey expressions.
-	 */
-	ResetExprContext(econtext);
-
-	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-
-	if (outer_tuple)
-		hashfunctions = hashtable->outer_hashfunctions;
-	else
-		hashfunctions = hashtable->inner_hashfunctions;
-
-	foreach(hk, hashkeys)
-	{
-		ExprState  *keyexpr = (ExprState *) lfirst(hk);
-		Datum		keyval;
-		bool		isNull;
-
-		/* combine successive hashkeys by rotating */
-		hashkey = pg_rotate_left32(hashkey, 1);
-
-		/*
-		 * Get the join attribute value of the tuple
-		 */
-		keyval = ExecEvalExpr(keyexpr, econtext, &isNull);
-
-		/*
-		 * If the attribute is NULL, and the join operator is strict, then
-		 * this tuple cannot pass the join qual so we can reject it
-		 * immediately (unless we're scanning the outside of an outer join, in
-		 * which case we must not reject it).  Otherwise we act like the
-		 * hashcode of NULL is zero (this will support operators that act like
-		 * IS NOT DISTINCT, though not any more-random behavior).  We treat
-		 * the hash support function as strict even if the operator is not.
-		 *
-		 * Note: currently, all hashjoinable operators must be strict since
-		 * the hash index AM assumes that.  However, it takes so little extra
-		 * code here to allow non-strict that we may as well do it.
-		 */
-		if (isNull)
-		{
-			if (hashtable->hashStrict[i] && !keep_nulls)
-			{
-				MemoryContextSwitchTo(oldContext);
-				return false;	/* cannot match */
-			}
-			/* else, leave hashkey unmodified, equivalent to hashcode 0 */
-		}
-		else
-		{
-			/* Compute the hash function */
-			uint32		hkey;
-
-			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i], hashtable->collations[i], keyval));
-			hashkey ^= hkey;
-		}
-
-		i++;
-	}
-
-	MemoryContextSwitchTo(oldContext);
-
-	*hashvalue = hashkey;
-	return true;
-}
 
 /*
  * ExecHashGetBucketAndBatch
@@ -2372,7 +2263,8 @@ ExecReScanHash(HashState *node)
  *		based on available memory.
  */
 static void
-ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
+ExecHashBuildSkewHash(HashState *hashstate, HashJoinTable hashtable,
+					  Hash *node, int mcvsToUse)
 {
 	HeapTupleData *statsTuple;
 	AttStatsSlot sslot;
@@ -2400,7 +2292,6 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 	{
 		double		frac;
 		int			nbuckets;
-		FmgrInfo   *hashfunctions;
 		int			i;
 
 		if (mcvsToUse > sslot.nvalues)
@@ -2468,15 +2359,14 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 		 * ExecHashRemoveNextSkewBucket) and we want the least common MCVs to
 		 * be removed first.
 		 */
-		hashfunctions = hashtable->outer_hashfunctions;
 
 		for (i = 0; i < mcvsToUse; i++)
 		{
 			uint32		hashvalue;
 			int			bucket;
 
-			hashvalue = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[0],
-														 hashtable->collations[0],
+			hashvalue = DatumGetUInt32(FunctionCall1Coll(hashstate->skew_hashfunction,
+														 hashstate->skew_collation,
 														 sslot.values[i]));
 
 			/*
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5429e68734..40d7436ddc 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -169,6 +169,7 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "utils/lsyscache.h"
 #include "utils/sharedtuplestore.h"
 #include "utils/wait_event.h"
 
@@ -331,10 +332,7 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
 				 * whoever gets here first will create the hash table and any
 				 * later arrivals will merely attach to it.
 				 */
-				hashtable = ExecHashTableCreate(hashNode,
-												node->hj_HashOperators,
-												node->hj_Collations,
-												HJ_FILL_INNER(node));
+				hashtable = ExecHashTableCreate(hashNode);
 				node->hj_HashTable = hashtable;
 
 				/*
@@ -820,9 +818,97 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	 */
 	{
 		HashState  *hashstate = (HashState *) innerPlanState(hjstate);
+		Hash	   *hash = (Hash *) hashstate->ps.plan;
 		TupleTableSlot *slot = hashstate->ps.ps_ResultTupleSlot;
+		Oid		   *outer_hashfuncid;
+		Oid		   *inner_hashfuncid;
+		bool	   *hash_strict;
+		ListCell   *lc;
+		int			nkeys;
+
 
 		hjstate->hj_HashTupleSlot = slot;
+
+		/*
+		 * Build ExprStates to obtain hash values for either side of the join.
+		 * This must be done here as ExecBuildHash32Expr needs to know how to
+		 * handle NULL inputs and the required handling of that depends on the
+		 * jointype.  We don't know the join type in ExecInitHash() and we
+		 * must build the ExprStates before ExecHashTableCreate() so we
+		 * properly attribute any SubPlans that exist in the hash expressions
+		 * to the correct PlanState.
+		 */
+		nkeys = list_length(node->hashoperators);
+
+		/* XXX worth putting these on the stack when nkeys == 1? */
+		outer_hashfuncid = palloc_array(Oid, nkeys);
+		inner_hashfuncid = palloc_array(Oid, nkeys);
+		hash_strict = palloc_array(bool, nkeys);
+
+		/*
+		 * Determine the hash function for each side of the join for the given
+		 * hash operator.
+		 */
+		foreach(lc, node->hashoperators)
+		{
+			Oid			hashop = lfirst_oid(lc);
+			int			i = foreach_current_index(lc);
+
+			if (!get_op_hash_functions(hashop,
+									   &outer_hashfuncid[i],
+									   &inner_hashfuncid[i]))
+				elog(ERROR,
+					 "could not find hash function for hash operator %u",
+					 hashop);
+			hash_strict[i] = op_strict(hashop);
+		}
+
+		/*
+		 * Build an ExprState to generate the hash value for the expressions
+		 * on the outer of the join.  This ExprState must finish generating
+		 * the hash value when HJ_FILL_OUTER() is true.  Otherwise,
+		 * ExecBuildHash32Expr will set up the ExprState to abort early if it
+		 * finds a NULL.  In these cases, we don't need to store these tuples
+		 * in the hash table as the jointype does not require it.
+		 */
+		hjstate->hj_OuterHash =
+			ExecBuildHash32Expr(hjstate->js.ps.ps_ResultTupleDesc,
+								hjstate->js.ps.resultops,
+								outer_hashfuncid,
+								node->hashcollations,
+								node->hashkeys,
+								hash_strict,
+								&hjstate->js.ps,
+								0,
+								HJ_FILL_OUTER(hjstate));
+
+		/* As above, but for the inner side of the join */
+		hashstate->hash_expr =
+			ExecBuildHash32Expr(hashstate->ps.ps_ResultTupleDesc,
+								hashstate->ps.resultops,
+								inner_hashfuncid,
+								node->hashcollations,
+								hash->hashkeys,
+								hash_strict,
+								&hashstate->ps,
+								0,
+								HJ_FILL_INNER(hjstate));
+
+		/*
+		 * Set up the skew table hash function while we have a record of the
+		 * first key's hash function Oid.
+		 */
+		if (OidIsValid(hash->skewTable))
+		{
+			hashstate->skew_hashfunction = palloc0(sizeof(FmgrInfo));
+			hashstate->skew_collation = linitial_oid(node->hashcollations);
+			fmgr_info(outer_hashfuncid[0], hashstate->skew_hashfunction);
+		}
+
+		/* no need to keep these */
+		pfree(outer_hashfuncid);
+		pfree(inner_hashfuncid);
+		pfree(hash_strict);
 	}
 
 	/*
@@ -846,11 +932,6 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	hjstate->hj_CurSkewBucketNo = INVALID_SKEW_BUCKET_NO;
 	hjstate->hj_CurTuple = NULL;
 
-	hjstate->hj_OuterHashKeys = ExecInitExprList(node->hashkeys,
-												 (PlanState *) hjstate);
-	hjstate->hj_HashOperators = node->hashoperators;
-	hjstate->hj_Collations = node->hashcollations;
-
 	hjstate->hj_JoinState = HJ_BUILD_HASHTABLE;
 	hjstate->hj_MatchedOuter = false;
 	hjstate->hj_OuterNotEmpty = false;
@@ -918,17 +999,22 @@ ExecHashJoinOuterGetTuple(PlanState *outerNode,
 
 		while (!TupIsNull(slot))
 		{
+			bool		isnull;
+
 			/*
 			 * We have to compute the tuple's hash value.
 			 */
 			ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
 
 			econtext->ecxt_outertuple = slot;
-			if (ExecHashGetHashValue(hashtable, econtext,
-									 hjstate->hj_OuterHashKeys,
-									 true,	/* outer tuple */
-									 HJ_FILL_OUTER(hjstate),
-									 hashvalue))
+
+			ResetExprContext(econtext);
+
+			*hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+																  econtext,
+																  &isnull));
+
+			if (!isnull)
 			{
 				/* remember outer relation is not empty for possible rescan */
 				hjstate->hj_OuterNotEmpty = true;
@@ -989,14 +1075,19 @@ ExecParallelHashJoinOuterGetTuple(PlanState *outerNode,
 
 		while (!TupIsNull(slot))
 		{
+			bool		isnull;
+
 			ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
 
 			econtext->ecxt_outertuple = slot;
-			if (ExecHashGetHashValue(hashtable, econtext,
-									 hjstate->hj_OuterHashKeys,
-									 true,	/* outer tuple */
-									 HJ_FILL_OUTER(hjstate),
-									 hashvalue))
+
+			ResetExprContext(econtext);
+
+			*hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+																  econtext,
+																  &isnull));
+
+			if (!isnull)
 				return slot;
 
 			/*
@@ -1518,15 +1609,20 @@ ExecParallelHashJoinPartitionOuter(HashJoinState *hjstate)
 	/* Execute outer plan, writing all tuples to shared tuplestores. */
 	for (;;)
 	{
+		bool		isnull;
+
 		slot = ExecProcNode(outerState);
 		if (TupIsNull(slot))
 			break;
 		econtext->ecxt_outertuple = slot;
-		if (ExecHashGetHashValue(hashtable, econtext,
-								 hjstate->hj_OuterHashKeys,
-								 true,	/* outer tuple */
-								 HJ_FILL_OUTER(hjstate),
-								 &hashvalue))
+
+		ResetExprContext(econtext);
+
+		hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+															 econtext,
+															 &isnull));
+
+		if (!isnull)
 		{
 			int			batchno;
 			int			bucketno;
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 27f94f9007..58767932b8 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -1900,6 +1900,165 @@ llvm_compile_expr(ExprState *state)
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
+			case EEOP_HASHDATUM_SET_INITVAL:
+				{
+					LLVMValueRef v_initvalue;
+
+					v_initvalue = l_sizet_const(op->d.hashdatum_initvalue.init_value);
+
+					LLVMBuildStore(b, v_initvalue, v_resvaluep);
+					LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+					LLVMBuildBr(b, opblocks[opno + 1]);
+					break;
+				}
+
+			case EEOP_HASHDATUM_FIRST:
+			case EEOP_HASHDATUM_FIRST_STRICT:
+			case EEOP_HASHDATUM_NEXT32:
+			case EEOP_HASHDATUM_NEXT32_STRICT:
+				{
+					FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+					LLVMValueRef v_fcinfo;
+					LLVMValueRef v_fcinfo_isnull;
+					LLVMValueRef v_retval;
+					LLVMBasicBlockRef b_checkargnull;
+					LLVMBasicBlockRef b_ifnotnull;
+					LLVMBasicBlockRef b_ifnullblock;
+					LLVMValueRef v_argisnull;
+					LLVMValueRef v_prevhash = NULL;
+
+					/*
+					 * If not hashing the first value then we must load the
+					 * previous hash value and rotate it left one bit.
+					 */
+					if (opcode == EEOP_HASHDATUM_NEXT32 ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						LLVMValueRef v_tmp1;
+						LLVMValueRef v_tmp2;
+
+						/*
+						 * Fetch the previously hashed value from where the
+						 * EEOP_HASHDATUM_FIRST* operation stored it.
+						 */
+						v_prevhash = l_load(b, TypeSizeT, v_resvaluep, "");
+
+						/*
+						 * Rotate bits left by 1 bit.  Be careful not to
+						 * overflow uint32 when working with size_t.
+						 */
+						v_tmp1 = LLVMBuildShl(b, v_prevhash, l_sizet_const(1),
+											  "");
+						v_tmp1 = LLVMBuildAnd(b, v_tmp1,
+											  l_sizet_const(0xffffffff), "");
+						v_tmp2 = LLVMBuildLShr(b, v_prevhash,
+											   l_sizet_const(31), "");
+						v_prevhash = LLVMBuildOr(b, v_tmp1, v_tmp2, "");
+					}
+
+					/*
+					 * Block for the actual function call, if args are
+					 * non-NULL.
+					 */
+					b_ifnotnull = l_bb_before_v(opblocks[opno + 1],
+												"b.%d.ifnotnull",
+												opno);
+
+					/* we expect the hash function to have 1 argument */
+					if (fcinfo->nargs != 1)
+						elog(ERROR, "incorrect number of function arguments");
+
+					v_fcinfo = l_ptr_const(fcinfo,
+										   l_ptr(StructFunctionCallInfoData));
+
+					b_checkargnull = l_bb_before_v(b_ifnotnull,
+												   "b.%d.isnull.0", opno);
+
+					LLVMBuildBr(b, b_checkargnull);
+
+					/*
+					 * Determine what to do if we find the argument to be
+					 * NULL.
+					 */
+					if (opcode == EEOP_HASHDATUM_FIRST_STRICT ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						b_ifnullblock = l_bb_before_v(b_ifnotnull,
+													  "b.%d.strictnull",
+													  opno);
+
+						LLVMPositionBuilderAtEnd(b, b_ifnullblock);
+
+						/*
+						 * In strict node, NULL inputs result in NULL.  Save
+						 * the NULL result and goto jumpdone.
+						 */
+						LLVMBuildStore(b, l_sbool_const(1), v_resnullp);
+						LLVMBuildStore(b, l_sizet_const(0), v_resvaluep);
+						LLVMBuildBr(b, opblocks[op->d.hashdatum.jumpdone]);
+					}
+					else if (opcode == EEOP_HASHDATUM_NEXT32)
+					{
+						b_ifnullblock = l_bb_before_v(b_ifnotnull,
+													  "b.%d.null",
+													  opno);
+
+						LLVMPositionBuilderAtEnd(b, b_ifnullblock);
+
+						Assert(v_prevhash != NULL);
+
+						/*
+						 * Save the rotated hash value and skip to the next
+						 * op.
+						 */
+						LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+						LLVMBuildStore(b, v_prevhash, v_resvaluep);
+						LLVMBuildBr(b, opblocks[opno + 1]);
+					}
+					else
+					{
+						/*
+						 * For EEOP_HASHDATUM_FIRST, there's no previous hash
+						 * value to store, so nothing extra to do when we get
+						 * a NULL input.
+						 */
+						b_ifnullblock = opblocks[opno + 1];
+					}
+
+					LLVMPositionBuilderAtEnd(b, b_checkargnull);
+
+					/* emit code to check if the input parameter is NULL */
+					v_argisnull = l_funcnull(b, v_fcinfo, 0);
+					LLVMBuildCondBr(b,
+									LLVMBuildICmp(b,
+												  LLVMIntEQ,
+												  v_argisnull,
+												  l_sbool_const(1),
+												  ""),
+									b_ifnullblock,
+									b_ifnotnull);
+
+					LLVMPositionBuilderAtEnd(b, b_ifnotnull);
+
+					/* call the hash function */
+					v_retval = BuildV1Call(context, b, mod, fcinfo,
+										   &v_fcinfo_isnull);
+
+					/*
+					 * For NEXT32 ops, XOR (^) the returned hash value with
+					 * the existing hash value.
+					 */
+					if (opcode == EEOP_HASHDATUM_NEXT32 ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+						v_retval = LLVMBuildXor(b, v_prevhash, v_retval, "");
+
+					LLVMBuildStore(b, v_retval, v_resvaluep);
+					LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+
+					LLVMBuildBr(b, opblocks[opno + 1]);
+					break;
+				}
+
 			case EEOP_CONVERT_ROWTYPE:
 				build_EvalXFunc(b, mod, "ExecEvalConvertRowtype",
 								v_state, op, v_econtext);
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 845f3422de..eec0aa699e 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -235,6 +235,13 @@ typedef enum ExprEvalOp
 	/* evaluate a single domain CHECK constraint */
 	EEOP_DOMAIN_CHECK,
 
+	/* evaluation steps for hashing */
+	EEOP_HASHDATUM_SET_INITVAL,
+	EEOP_HASHDATUM_FIRST,
+	EEOP_HASHDATUM_FIRST_STRICT,
+	EEOP_HASHDATUM_NEXT32,
+	EEOP_HASHDATUM_NEXT32_STRICT,
+
 	/* evaluate assorted special-purpose expression types */
 	EEOP_CONVERT_ROWTYPE,
 	EEOP_SCALARARRAYOP,
@@ -558,6 +565,23 @@ typedef struct ExprEvalStep
 			ErrorSaveContext *escontext;
 		}			domaincheck;
 
+		/* for EEOP_HASH_SET_INITVAL */
+		struct
+		{
+			Datum		init_value;
+
+		}			hashdatum_initvalue;
+
+		/* for EEOP_HASHDATUM_(FIRST|NEXT32)[_STRICT] */
+		struct
+		{
+			FmgrInfo   *finfo;	/* function's lookup data */
+			FunctionCallInfo fcinfo_data;	/* arguments etc */
+			/* faster to access without additional indirection: */
+			PGFunction	fn_addr;	/* actual call address */
+			int			jumpdone;	/* jump here on null */
+		}			hashdatum;
+
 		/* for EEOP_CONVERT_ROWTYPE */
 		struct
 		{
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9770752ea3..046a7fb69b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -285,6 +285,13 @@ extern ExprState *ExecInitCheck(List *qual, PlanState *parent);
 extern List *ExecInitExprList(List *nodes, PlanState *parent);
 extern ExprState *ExecBuildAggTrans(AggState *aggstate, struct AggStatePerPhaseData *phase,
 									bool doSort, bool doHash, bool nullcheck);
+extern ExprState *ExecBuildHash32Expr(TupleDesc desc,
+									  const TupleTableSlotOps *ops,
+									  const Oid *hashfunc_oids,
+									  const List *collations,
+									  const List *hash_exprs,
+									  const bool *opstrict, PlanState *parent,
+									  uint32 init_value, bool keep_nulls);
 extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const TupleTableSlotOps *lops, const TupleTableSlotOps *rops,
 										 int numCols,
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 9197846cda..2d8ed8688c 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -313,8 +313,6 @@ typedef struct HashJoinTableData
 		dsa_pointer_atomic *shared;
 	}			buckets;
 
-	bool		keepNulls;		/* true to store unmatchable NULL tuples */
-
 	bool		skewEnabled;	/* are we using skew optimization? */
 	HashSkewBucket **skewBucket;	/* hashtable of skew buckets */
 	int			skewBucketLen;	/* size of skewBucket array (a power of 2!) */
@@ -343,16 +341,6 @@ typedef struct HashJoinTableData
 	BufFile   **innerBatchFile; /* buffered virtual temp file per batch */
 	BufFile   **outerBatchFile; /* buffered virtual temp file per batch */
 
-	/*
-	 * Info about the datatype-specific hash functions for the datatypes being
-	 * hashed. These are arrays of the same length as the number of hash join
-	 * clauses (hash keys).
-	 */
-	FmgrInfo   *outer_hashfunctions;	/* lookup data for hash functions */
-	FmgrInfo   *inner_hashfunctions;	/* lookup data for hash functions */
-	bool	   *hashStrict;		/* is each hash join operator strict? */
-	Oid		   *collations;
-
 	Size		spaceUsed;		/* memory space currently used by tuples */
 	Size		spaceAllowed;	/* upper limit for space used */
 	Size		spacePeak;		/* peak space used */
diff --git a/src/include/executor/nodeHash.h b/src/include/executor/nodeHash.h
index a95911c2fe..28dd2229f8 100644
--- a/src/include/executor/nodeHash.h
+++ b/src/include/executor/nodeHash.h
@@ -24,8 +24,7 @@ extern Node *MultiExecHash(HashState *node);
 extern void ExecEndHash(HashState *node);
 extern void ExecReScanHash(HashState *node);
 
-extern HashJoinTable ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
-										 bool keepNulls);
+extern HashJoinTable ExecHashTableCreate(HashState *state);
 extern void ExecParallelHashTableAlloc(HashJoinTable hashtable,
 									   int batchno);
 extern void ExecHashTableDestroy(HashJoinTable hashtable);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 87f1519ec6..af7d8fd1e7 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2184,8 +2184,7 @@ typedef struct MergeJoinState
  *	 HashJoinState information
  *
  *		hashclauses				original form of the hashjoin condition
- *		hj_OuterHashKeys		the outer hash keys in the hashjoin condition
- *		hj_HashOperators		the join operators in the hashjoin condition
+ *		hj_OuterHash			ExprState for hashing outer keys
  *		hj_HashTable			hash table for the hashjoin
  *								(NULL if table not built yet)
  *		hj_CurHashValue			hash value for current outer tuple
@@ -2215,9 +2214,7 @@ typedef struct HashJoinState
 {
 	JoinState	js;				/* its first field is NodeTag */
 	ExprState  *hashclauses;
-	List	   *hj_OuterHashKeys;	/* list of ExprState nodes */
-	List	   *hj_HashOperators;	/* list of operator OIDs */
-	List	   *hj_Collations;
+	ExprState  *hj_OuterHash;
 	HashJoinTable hj_HashTable;
 	uint32		hj_CurHashValue;
 	int			hj_CurBucketNo;
@@ -2770,7 +2767,10 @@ typedef struct HashState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	HashJoinTable hashtable;	/* hash table for the hashjoin */
-	List	   *hashkeys;		/* list of ExprState nodes */
+	ExprState  *hash_expr;		/* ExprState to get hash value */
+
+	FmgrInfo   *skew_hashfunction;	/* lookup data for skew hash function */
+	Oid			skew_collation; /* collation to call skew_hashfunction with */
 
 	/*
 	 * In a parallelized hash join, the leader retains a pointer to the
-- 
2.34.1

#4David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#1)
1 attachment(s)
Re: Speed up Hash Join by teaching ExprState about hashing

On Thu, 15 Aug 2024 at 19:50, Alexey Dvoichenkov <alexey@hyperplane.net> wrote:

I gave v3 another look. One tiny thing I've noticed is that you
removed ExecHashGetHashValue() but not its forward declaration in
include/executor/nodeHash.h

Fixed

I also reviewed the JIT code this time, it looks reasonable to
me. I've added names to some variables to make the IR easier to
read. (Probably best to squash it into your patch, if you want to
apply this.)

Thanks. I've included that.

I made another complete pass over this today and I noticed that there
were a few cases where I wasn't properly setting resnull and resvalue
to (Datum) 0.

I'm happy with the patch now. I am aware nothing currently uses
EEOP_HASHDATUM_SET_INITVAL, but I want to get moving with the Hash
Aggregate usages of this code fairly quickly and I'd rather get the
ExprState step code done now and not have to change it again.

v4 patch attached. If nobody else wants to look at this then I'm
planning on pushing it soon.

David

Attachments:

v4-0001-Speed-up-Hash-Join-by-making-ExprStates-support-h.patchapplication/octet-stream; name=v4-0001-Speed-up-Hash-Join-by-making-ExprStates-support-h.patchDownload
From 20f2381eb4fe3eb18a8afd15ecbe6c5142e5357d Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Fri, 10 May 2024 20:22:07 +1200
Subject: [PATCH v4] Speed up Hash Join by making ExprStates support hashing

Here we add ExprState support for obtaining a 32-bit hash value from a
list of expressions.  This allows both faster hashing and also JIT
compilation of these expressions.  This is especially useful when hash
joins have multiple join keys as the previous code called ExecEvalExpr on
each hash join key individually and that was inefficient as tuple
deformation would have only taken into account one key at a time, which
could lead to walking the tuple once for each join key.  With the new
code, we'll determine the maximum attribute required and deform the tuple
to that point only once.

Some performance tests done with this change have shown up to a 20%
performance increase of a query containing a Hash Join without JIT
compilation and up to a 26% performance increase when JIT is enabled and
optimization and inlining were performed by the JIT compiler.  The
performance increase with 1 join column was less with a 14% increase
with and without JIT.  This test was done using a fairly small hash
table and a large number of hash probes.  The increase will likely be
less with large tables, especially ones larger than L3 cache as memory
pressure is more likely to be the limiting factor there.

This commit only addresses Hash Joins, but lays expression evaluation
and JIT compilation infrastructure for other hashing needs such as Hash
Aggregate.  That will be done as a follow-up commit.

Author: David Rowley
Reviewed-by: Alexey Dvoichenkov <alexey@hyperplane.net>
Discussion: https://postgr.es/m/CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm%3Dm0L%2BGc%3DT%3DkEa4g%40mail.gmail.com
---
 src/backend/executor/execExpr.c       | 141 +++++++++++++++++++
 src/backend/executor/execExprInterp.c | 104 ++++++++++++++
 src/backend/executor/nodeHash.c       | 190 ++++++--------------------
 src/backend/executor/nodeHashjoin.c   | 143 +++++++++++++++----
 src/backend/jit/llvm/llvmjit_expr.c   | 169 +++++++++++++++++++++++
 src/include/executor/execExpr.h       |  24 ++++
 src/include/executor/executor.h       |   7 +
 src/include/executor/hashjoin.h       |  12 --
 src/include/executor/nodeHash.h       |   9 +-
 src/include/nodes/execnodes.h         |  12 +-
 10 files changed, 609 insertions(+), 202 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 66dda8e5e6..ddbf760746 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3969,6 +3969,147 @@ ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 	}
 }
 
+/*
+ * Build an ExprState that calls the given hash function(s) on the given
+ * 'hash_exprs'.  When multiple expressions are present, the hash values
+ * returned by each function are combined to produce a single hash value.
+ *
+ * desc: tuple descriptor for the to-be-hashed expressions
+ * ops: TupleTableSlotOps for the TupleDesc
+ * hashfunc_oids: Oid for each hash function to call, one for each 'hash_expr'
+ * collations: collation to use when calling the hash function.
+ * hash_expr: list of expressions to hash the value of
+ * opstrict: array corresponding to the 'hashfunc_oids' to store op_strict()
+ * parent: PlanState node that the 'hash_exprs' will be evaluated at
+ * init_value: Normally 0, but can be set to other values to seed the hash
+ * with some other value.  Using non-zero is slightly less efficient but can
+ * be useful.
+ * keep_nulls: if true, evaluation of the returned ExprState will abort early
+ * returning NULL if the given hash function is strict and the Datum to hash
+ * is null.  When set to false, any NULL input Datums are skipped.
+ */
+ExprState *
+ExecBuildHash32Expr(TupleDesc desc, const TupleTableSlotOps *ops,
+					const Oid *hashfunc_oids, const List *collations,
+					const List *hash_exprs, const bool *opstrict,
+					PlanState *parent, uint32 init_value, bool keep_nulls)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	intptr_t	strict_opcode;
+	intptr_t	opcode;
+
+	Assert(list_length(hash_exprs) == list_length(collations));
+
+	state->parent = parent;
+
+	/* Insert setup steps as needed. */
+	ExecCreateExprSetupSteps(state, (Node *) hash_exprs);
+
+	if (init_value == 0)
+	{
+		/*
+		 * No initial value, so we can assign the result of the hash function
+		 * for the first hash_expr without having to concern ourselves with
+		 * combining the result with any initial value.
+		 */
+		strict_opcode = EEOP_HASHDATUM_FIRST_STRICT;
+		opcode = EEOP_HASHDATUM_FIRST;
+	}
+	else
+	{
+		/* Set up operation to set the initial value. */
+		scratch.opcode = EEOP_HASHDATUM_SET_INITVAL;
+		scratch.d.hashdatum_initvalue.init_value = UInt32GetDatum(init_value);
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+
+		ExprEvalPushStep(state, &scratch);
+
+		/*
+		 * When using an initial value use the NEXT32/NEXT32_STRICT ops as the
+		 * FIRST/FIRST_STRICT ops would overwrite the stored initial value.
+		 */
+		strict_opcode = EEOP_HASHDATUM_NEXT32_STRICT;
+		opcode = EEOP_HASHDATUM_NEXT32;
+	}
+
+	forboth(lc, hash_exprs, lc2, collations)
+	{
+		Expr	   *expr = (Expr *) lfirst(lc);
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		int			i = foreach_current_index(lc);
+		Oid			funcid;
+		Oid			inputcollid = lfirst_oid(lc2);
+
+		funcid = hashfunc_oids[i];
+
+		/* Allocate hash function lookup data. */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(1));
+
+		fmgr_info(funcid, finfo);
+
+		/*
+		 * Build the steps to evaluate the hash function's argument have it so
+		 * the value of that is stored in the 0th argument of the hash func.
+		 */
+		ExecInitExprRec(expr,
+						state,
+						&fcinfo->args[0].value,
+						&fcinfo->args[0].isnull);
+
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+
+		/* Initialize function call parameter structure too */
+		InitFunctionCallInfoData(*fcinfo, finfo, 1, inputcollid, NULL, NULL);
+
+		scratch.d.hashdatum.finfo = finfo;
+		scratch.d.hashdatum.fcinfo_data = fcinfo;
+		scratch.d.hashdatum.fn_addr = finfo->fn_addr;
+
+		scratch.opcode = opstrict[i] && !keep_nulls ? strict_opcode : opcode;
+		scratch.d.hashdatum.jumpdone = -1;
+
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps, state->steps_len - 1);
+
+		/*
+		 * For subsequent keys we must combine the hash value with the
+		 * previous hashes.
+		 */
+		strict_opcode = EEOP_HASHDATUM_NEXT32_STRICT;
+		opcode = EEOP_HASHDATUM_NEXT32;
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_HASHDATUM_FIRST ||
+			   as->opcode == EEOP_HASHDATUM_FIRST_STRICT ||
+			   as->opcode == EEOP_HASHDATUM_NEXT32 ||
+			   as->opcode == EEOP_HASHDATUM_NEXT32_STRICT);
+		Assert(as->d.hashdatum.jumpdone == -1);
+		as->d.hashdatum.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
+
 /*
  * Build equality expression that can be evaluated using ExecQual(), returning
  * true if the expression context's inner/outer tuple are NOT DISTINCT. I.e
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ea47c4d6f9..9389d44c81 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -477,6 +477,11 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&&CASE_EEOP_DOMAIN_TESTVAL,
 		&&CASE_EEOP_DOMAIN_NOTNULL,
 		&&CASE_EEOP_DOMAIN_CHECK,
+		&&CASE_EEOP_HASHDATUM_SET_INITVAL,
+		&&CASE_EEOP_HASHDATUM_FIRST,
+		&&CASE_EEOP_HASHDATUM_FIRST_STRICT,
+		&&CASE_EEOP_HASHDATUM_NEXT32,
+		&&CASE_EEOP_HASHDATUM_NEXT32_STRICT,
 		&&CASE_EEOP_CONVERT_ROWTYPE,
 		&&CASE_EEOP_SCALARARRAYOP,
 		&&CASE_EEOP_HASHED_SCALARARRAYOP,
@@ -1543,6 +1548,105 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 			EEO_NEXT();
 		}
 
+		EEO_CASE(EEOP_HASHDATUM_SET_INITVAL)
+		{
+			*op->resvalue = op->d.hashdatum_initvalue.init_value;
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_FIRST)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			if (!fcinfo->args[0].isnull)
+				*op->resvalue = op->d.hashdatum.fn_addr(fcinfo);
+			else
+				*op->resvalue = (Datum) 0;
+
+			/* leave the hash value alone on NULL inputs */
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_FIRST_STRICT)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			if (fcinfo->args[0].isnull)
+			{
+				/*
+				 * With strict we have the expression return NULL instead of
+				 * ignoring NULL input values.  We've nothing more to do after
+				 * finding a NULL.
+				 */
+				*op->resnull = true;
+				*op->resvalue = (Datum) 0;
+				EEO_JUMP(op->d.hashdatum.jumpdone);
+			}
+
+			/* execute the hash function and save the resulting value */
+			*op->resvalue = op->d.hashdatum.fn_addr(fcinfo);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_NEXT32)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+			uint32		existing_hash = DatumGetUInt32(*op->resvalue);
+
+			/* combine successive hash values by rotating */
+			existing_hash = pg_rotate_left32(existing_hash, 1);
+
+			/* leave the hash value alone on NULL inputs */
+			if (!fcinfo->args[0].isnull)
+			{
+				uint32		hashvalue;
+
+				/* execute hash func and combine with previous hash value */
+				hashvalue = DatumGetUInt32(op->d.hashdatum.fn_addr(fcinfo));
+				existing_hash = existing_hash ^ hashvalue;
+			}
+
+			*op->resvalue = UInt32GetDatum(existing_hash);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_NEXT32_STRICT)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+			uint32		existing_hash = DatumGetUInt32(*op->resvalue);
+			uint32		hashvalue;
+
+			/* combine successive hash values by rotating */
+			existing_hash = pg_rotate_left32(existing_hash, 1);
+
+			if (fcinfo->args[0].isnull)
+			{
+				/*
+				 * With strict we have the expression return NULL instead of
+				 * ignoring NULL input values.  We've nothing more to do after
+				 * finding a NULL.
+				 */
+				*op->resnull = true;
+				*op->resvalue = (Datum) 0;
+				EEO_JUMP(op->d.hashdatum.jumpdone);
+			}
+
+			/* execute hash func and combine with previous hash value */
+			hashvalue = DatumGetUInt32(op->d.hashdatum.fn_addr(fcinfo));
+			*op->resvalue = UInt32GetDatum(existing_hash ^ hashvalue);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
 		EEO_CASE(EEOP_XMLEXPR)
 		{
 			/* too complex for an inline implementation */
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 61480733a1..570a90ebe1 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -47,7 +47,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
-static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
+static void ExecHashBuildSkewHash(HashState *hashstate,
+								  HashJoinTable hashtable, Hash *node,
 								  int mcvsToUse);
 static void ExecHashSkewTableInsert(HashJoinTable hashtable,
 									TupleTableSlot *slot,
@@ -138,11 +139,9 @@ static void
 MultiExecPrivateHash(HashState *node)
 {
 	PlanState  *outerNode;
-	List	   *hashkeys;
 	HashJoinTable hashtable;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
-	uint32		hashvalue;
 
 	/*
 	 * get state info from node
@@ -153,7 +152,6 @@ MultiExecPrivateHash(HashState *node)
 	/*
 	 * set expression context
 	 */
-	hashkeys = node->hashkeys;
 	econtext = node->ps.ps_ExprContext;
 
 	/*
@@ -162,15 +160,23 @@ MultiExecPrivateHash(HashState *node)
 	 */
 	for (;;)
 	{
+		bool		isnull;
+		Datum		hashdatum;
+
 		slot = ExecProcNode(outerNode);
 		if (TupIsNull(slot))
 			break;
 		/* We have to compute the hash value */
 		econtext->ecxt_outertuple = slot;
-		if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
-								 false, hashtable->keepNulls,
-								 &hashvalue))
+
+		ResetExprContext(econtext);
+
+		hashdatum = ExecEvalExprSwitchContext(node->hash_expr, econtext,
+											  &isnull);
+
+		if (!isnull)
 		{
+			uint32		hashvalue = DatumGetUInt32(hashdatum);
 			int			bucketNumber;
 
 			bucketNumber = ExecHashGetSkewBucket(hashtable, hashvalue);
@@ -215,7 +221,6 @@ MultiExecParallelHash(HashState *node)
 {
 	ParallelHashJoinState *pstate;
 	PlanState  *outerNode;
-	List	   *hashkeys;
 	HashJoinTable hashtable;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
@@ -232,7 +237,6 @@ MultiExecParallelHash(HashState *node)
 	/*
 	 * set expression context
 	 */
-	hashkeys = node->hashkeys;
 	econtext = node->ps.ps_ExprContext;
 
 	/*
@@ -279,13 +283,20 @@ MultiExecParallelHash(HashState *node)
 			ExecParallelHashTableSetCurrentBatch(hashtable, 0);
 			for (;;)
 			{
+				bool		isnull;
+
 				slot = ExecProcNode(outerNode);
 				if (TupIsNull(slot))
 					break;
 				econtext->ecxt_outertuple = slot;
-				if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
-										 false, hashtable->keepNulls,
-										 &hashvalue))
+
+				ResetExprContext(econtext);
+
+				hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(node->hash_expr,
+																	 econtext,
+																	 &isnull));
+
+				if (!isnull)
 					ExecParallelHashTableInsert(hashtable, slot, hashvalue);
 				hashtable->partialTuples++;
 			}
@@ -371,8 +382,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
 	hashstate->ps.plan = (Plan *) node;
 	hashstate->ps.state = estate;
 	hashstate->ps.ExecProcNode = ExecHash;
+	/* delay building hashtable until ExecHashTableCreate() in executor run */
 	hashstate->hashtable = NULL;
-	hashstate->hashkeys = NIL;	/* will be set by parent HashJoin */
 
 	/*
 	 * Miscellaneous initialization
@@ -393,12 +404,16 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
 	ExecInitResultTupleSlotTL(&hashstate->ps, &TTSOpsMinimalTuple);
 	hashstate->ps.ps_ProjInfo = NULL;
 
+	Assert(node->plan.qual == NIL);
+
 	/*
-	 * initialize child expressions
+	 * Delay initialization of hash_expr until ExecInitHashJoin().  We cannot
+	 * build the ExprState here as we don't yet know the join type we're going
+	 * to be hashing values for and we need to know that before calling
+	 * ExecBuildHash32Expr as the keep_nulls parameter depends on the join
+	 * type.
 	 */
-	Assert(node->plan.qual == NIL);
-	hashstate->hashkeys =
-		ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
+	hashstate->hash_expr = NULL;
 
 	return hashstate;
 }
@@ -429,7 +444,7 @@ ExecEndHash(HashState *node)
  * ----------------------------------------------------------------
  */
 HashJoinTable
-ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations, bool keepNulls)
+ExecHashTableCreate(HashState *state)
 {
 	Hash	   *node;
 	HashJoinTable hashtable;
@@ -440,10 +455,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	double		rows;
 	int			num_skew_mcvs;
 	int			log2_nbuckets;
-	int			nkeys;
-	int			i;
-	ListCell   *ho;
-	ListCell   *hc;
 	MemoryContext oldcxt;
 
 	/*
@@ -487,7 +498,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	hashtable->log2_nbuckets = log2_nbuckets;
 	hashtable->log2_nbuckets_optimal = log2_nbuckets;
 	hashtable->buckets.unshared = NULL;
-	hashtable->keepNulls = keepNulls;
 	hashtable->skewEnabled = false;
 	hashtable->skewBucket = NULL;
 	hashtable->skewBucketLen = 0;
@@ -540,32 +550,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 
 	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
 
-	/*
-	 * Get info about the hash functions to be used for each hash key. Also
-	 * remember whether the join operators are strict.
-	 */
-	nkeys = list_length(hashOperators);
-	hashtable->outer_hashfunctions = palloc_array(FmgrInfo, nkeys);
-	hashtable->inner_hashfunctions = palloc_array(FmgrInfo, nkeys);
-	hashtable->hashStrict = palloc_array(bool, nkeys);
-	hashtable->collations = palloc_array(Oid, nkeys);
-	i = 0;
-	forboth(ho, hashOperators, hc, hashCollations)
-	{
-		Oid			hashop = lfirst_oid(ho);
-		Oid			left_hashfn;
-		Oid			right_hashfn;
-
-		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
-			elog(ERROR, "could not find hash function for hash operator %u",
-				 hashop);
-		fmgr_info(left_hashfn, &hashtable->outer_hashfunctions[i]);
-		fmgr_info(right_hashfn, &hashtable->inner_hashfunctions[i]);
-		hashtable->hashStrict[i] = op_strict(hashop);
-		hashtable->collations[i] = lfirst_oid(hc);
-		i++;
-	}
-
 	if (nbatch > 1 && hashtable->parallel_state == NULL)
 	{
 		MemoryContext oldctx;
@@ -652,7 +636,7 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 		 * it.)
 		 */
 		if (nbatch > 1)
-			ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);
+			ExecHashBuildSkewHash(state, hashtable, node, num_skew_mcvs);
 
 		MemoryContextSwitchTo(oldcxt);
 	}
@@ -1803,103 +1787,6 @@ ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable,
 		heap_free_minimal_tuple(tuple);
 }
 
-/*
- * ExecHashGetHashValue
- *		Compute the hash value for a tuple
- *
- * The tuple to be tested must be in econtext->ecxt_outertuple (thus Vars in
- * the hashkeys expressions need to have OUTER_VAR as varno). If outer_tuple
- * is false (meaning it's the HashJoin's inner node, Hash), econtext,
- * hashkeys, and slot need to be from Hash, with hashkeys/slot referencing and
- * being suitable for tuples from the node below the Hash. Conversely, if
- * outer_tuple is true, econtext is from HashJoin, and hashkeys/slot need to
- * be appropriate for tuples from HashJoin's outer node.
- *
- * A true result means the tuple's hash value has been successfully computed
- * and stored at *hashvalue.  A false result means the tuple cannot match
- * because it contains a null attribute, and hence it should be discarded
- * immediately.  (If keep_nulls is true then false is never returned.)
- */
-bool
-ExecHashGetHashValue(HashJoinTable hashtable,
-					 ExprContext *econtext,
-					 List *hashkeys,
-					 bool outer_tuple,
-					 bool keep_nulls,
-					 uint32 *hashvalue)
-{
-	uint32		hashkey = 0;
-	FmgrInfo   *hashfunctions;
-	ListCell   *hk;
-	int			i = 0;
-	MemoryContext oldContext;
-
-	/*
-	 * We reset the eval context each time to reclaim any memory leaked in the
-	 * hashkey expressions.
-	 */
-	ResetExprContext(econtext);
-
-	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-
-	if (outer_tuple)
-		hashfunctions = hashtable->outer_hashfunctions;
-	else
-		hashfunctions = hashtable->inner_hashfunctions;
-
-	foreach(hk, hashkeys)
-	{
-		ExprState  *keyexpr = (ExprState *) lfirst(hk);
-		Datum		keyval;
-		bool		isNull;
-
-		/* combine successive hashkeys by rotating */
-		hashkey = pg_rotate_left32(hashkey, 1);
-
-		/*
-		 * Get the join attribute value of the tuple
-		 */
-		keyval = ExecEvalExpr(keyexpr, econtext, &isNull);
-
-		/*
-		 * If the attribute is NULL, and the join operator is strict, then
-		 * this tuple cannot pass the join qual so we can reject it
-		 * immediately (unless we're scanning the outside of an outer join, in
-		 * which case we must not reject it).  Otherwise we act like the
-		 * hashcode of NULL is zero (this will support operators that act like
-		 * IS NOT DISTINCT, though not any more-random behavior).  We treat
-		 * the hash support function as strict even if the operator is not.
-		 *
-		 * Note: currently, all hashjoinable operators must be strict since
-		 * the hash index AM assumes that.  However, it takes so little extra
-		 * code here to allow non-strict that we may as well do it.
-		 */
-		if (isNull)
-		{
-			if (hashtable->hashStrict[i] && !keep_nulls)
-			{
-				MemoryContextSwitchTo(oldContext);
-				return false;	/* cannot match */
-			}
-			/* else, leave hashkey unmodified, equivalent to hashcode 0 */
-		}
-		else
-		{
-			/* Compute the hash function */
-			uint32		hkey;
-
-			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i], hashtable->collations[i], keyval));
-			hashkey ^= hkey;
-		}
-
-		i++;
-	}
-
-	MemoryContextSwitchTo(oldContext);
-
-	*hashvalue = hashkey;
-	return true;
-}
 
 /*
  * ExecHashGetBucketAndBatch
@@ -2372,7 +2259,8 @@ ExecReScanHash(HashState *node)
  *		based on available memory.
  */
 static void
-ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
+ExecHashBuildSkewHash(HashState *hashstate, HashJoinTable hashtable,
+					  Hash *node, int mcvsToUse)
 {
 	HeapTupleData *statsTuple;
 	AttStatsSlot sslot;
@@ -2400,7 +2288,6 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 	{
 		double		frac;
 		int			nbuckets;
-		FmgrInfo   *hashfunctions;
 		int			i;
 
 		if (mcvsToUse > sslot.nvalues)
@@ -2468,15 +2355,14 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 		 * ExecHashRemoveNextSkewBucket) and we want the least common MCVs to
 		 * be removed first.
 		 */
-		hashfunctions = hashtable->outer_hashfunctions;
 
 		for (i = 0; i < mcvsToUse; i++)
 		{
 			uint32		hashvalue;
 			int			bucket;
 
-			hashvalue = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[0],
-														 hashtable->collations[0],
+			hashvalue = DatumGetUInt32(FunctionCall1Coll(hashstate->skew_hashfunction,
+														 hashstate->skew_collation,
 														 sslot.values[i]));
 
 			/*
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5429e68734..2f7170604d 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -169,6 +169,7 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "utils/lsyscache.h"
 #include "utils/sharedtuplestore.h"
 #include "utils/wait_event.h"
 
@@ -331,10 +332,7 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
 				 * whoever gets here first will create the hash table and any
 				 * later arrivals will merely attach to it.
 				 */
-				hashtable = ExecHashTableCreate(hashNode,
-												node->hj_HashOperators,
-												node->hj_Collations,
-												HJ_FILL_INNER(node));
+				hashtable = ExecHashTableCreate(hashNode);
 				node->hj_HashTable = hashtable;
 
 				/*
@@ -820,9 +818,96 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	 */
 	{
 		HashState  *hashstate = (HashState *) innerPlanState(hjstate);
+		Hash	   *hash = (Hash *) hashstate->ps.plan;
 		TupleTableSlot *slot = hashstate->ps.ps_ResultTupleSlot;
+		Oid		   *outer_hashfuncid;
+		Oid		   *inner_hashfuncid;
+		bool	   *hash_strict;
+		ListCell   *lc;
+		int			nkeys;
+
 
 		hjstate->hj_HashTupleSlot = slot;
+
+		/*
+		 * Build ExprStates to obtain hash values for either side of the join.
+		 * This must be done here as ExecBuildHash32Expr needs to know how to
+		 * handle NULL inputs and the required handling of that depends on the
+		 * jointype.  We don't know the join type in ExecInitHash() and we
+		 * must build the ExprStates before ExecHashTableCreate() so we
+		 * properly attribute any SubPlans that exist in the hash expressions
+		 * to the correct PlanState.
+		 */
+		nkeys = list_length(node->hashoperators);
+
+		outer_hashfuncid = palloc_array(Oid, nkeys);
+		inner_hashfuncid = palloc_array(Oid, nkeys);
+		hash_strict = palloc_array(bool, nkeys);
+
+		/*
+		 * Determine the hash function for each side of the join for the given
+		 * hash operator.
+		 */
+		foreach(lc, node->hashoperators)
+		{
+			Oid			hashop = lfirst_oid(lc);
+			int			i = foreach_current_index(lc);
+
+			if (!get_op_hash_functions(hashop,
+									   &outer_hashfuncid[i],
+									   &inner_hashfuncid[i]))
+				elog(ERROR,
+					 "could not find hash function for hash operator %u",
+					 hashop);
+			hash_strict[i] = op_strict(hashop);
+		}
+
+		/*
+		 * Build an ExprState to generate the hash value for the expressions
+		 * on the outer of the join.  This ExprState must finish generating
+		 * the hash value when HJ_FILL_OUTER() is true.  Otherwise,
+		 * ExecBuildHash32Expr will set up the ExprState to abort early if it
+		 * finds a NULL.  In these cases, we don't need to store these tuples
+		 * in the hash table as the jointype does not require it.
+		 */
+		hjstate->hj_OuterHash =
+			ExecBuildHash32Expr(hjstate->js.ps.ps_ResultTupleDesc,
+								hjstate->js.ps.resultops,
+								outer_hashfuncid,
+								node->hashcollations,
+								node->hashkeys,
+								hash_strict,
+								&hjstate->js.ps,
+								0,
+								HJ_FILL_OUTER(hjstate));
+
+		/* As above, but for the inner side of the join */
+		hashstate->hash_expr =
+			ExecBuildHash32Expr(hashstate->ps.ps_ResultTupleDesc,
+								hashstate->ps.resultops,
+								inner_hashfuncid,
+								node->hashcollations,
+								hash->hashkeys,
+								hash_strict,
+								&hashstate->ps,
+								0,
+								HJ_FILL_INNER(hjstate));
+
+		/*
+		 * Set up the skew table hash function while we have a record of the
+		 * first key's hash function Oid.
+		 */
+		if (OidIsValid(hash->skewTable))
+		{
+			hashstate->skew_hashfunction = palloc0(sizeof(FmgrInfo));
+			hashstate->skew_collation = linitial_oid(node->hashcollations);
+			fmgr_info(outer_hashfuncid[0], hashstate->skew_hashfunction);
+		}
+
+		/* no need to keep these */
+		pfree(outer_hashfuncid);
+		pfree(inner_hashfuncid);
+		pfree(hash_strict);
 	}
 
 	/*
@@ -846,11 +931,6 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	hjstate->hj_CurSkewBucketNo = INVALID_SKEW_BUCKET_NO;
 	hjstate->hj_CurTuple = NULL;
 
-	hjstate->hj_OuterHashKeys = ExecInitExprList(node->hashkeys,
-												 (PlanState *) hjstate);
-	hjstate->hj_HashOperators = node->hashoperators;
-	hjstate->hj_Collations = node->hashcollations;
-
 	hjstate->hj_JoinState = HJ_BUILD_HASHTABLE;
 	hjstate->hj_MatchedOuter = false;
 	hjstate->hj_OuterNotEmpty = false;
@@ -918,17 +998,22 @@ ExecHashJoinOuterGetTuple(PlanState *outerNode,
 
 		while (!TupIsNull(slot))
 		{
+			bool		isnull;
+
 			/*
 			 * We have to compute the tuple's hash value.
 			 */
 			ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
 
 			econtext->ecxt_outertuple = slot;
-			if (ExecHashGetHashValue(hashtable, econtext,
-									 hjstate->hj_OuterHashKeys,
-									 true,	/* outer tuple */
-									 HJ_FILL_OUTER(hjstate),
-									 hashvalue))
+
+			ResetExprContext(econtext);
+
+			*hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+																  econtext,
+																  &isnull));
+
+			if (!isnull)
 			{
 				/* remember outer relation is not empty for possible rescan */
 				hjstate->hj_OuterNotEmpty = true;
@@ -989,14 +1074,19 @@ ExecParallelHashJoinOuterGetTuple(PlanState *outerNode,
 
 		while (!TupIsNull(slot))
 		{
+			bool		isnull;
+
 			ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
 
 			econtext->ecxt_outertuple = slot;
-			if (ExecHashGetHashValue(hashtable, econtext,
-									 hjstate->hj_OuterHashKeys,
-									 true,	/* outer tuple */
-									 HJ_FILL_OUTER(hjstate),
-									 hashvalue))
+
+			ResetExprContext(econtext);
+
+			*hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+																  econtext,
+																  &isnull));
+
+			if (!isnull)
 				return slot;
 
 			/*
@@ -1518,15 +1608,20 @@ ExecParallelHashJoinPartitionOuter(HashJoinState *hjstate)
 	/* Execute outer plan, writing all tuples to shared tuplestores. */
 	for (;;)
 	{
+		bool		isnull;
+
 		slot = ExecProcNode(outerState);
 		if (TupIsNull(slot))
 			break;
 		econtext->ecxt_outertuple = slot;
-		if (ExecHashGetHashValue(hashtable, econtext,
-								 hjstate->hj_OuterHashKeys,
-								 true,	/* outer tuple */
-								 HJ_FILL_OUTER(hjstate),
-								 &hashvalue))
+
+		ResetExprContext(econtext);
+
+		hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+															 econtext,
+															 &isnull));
+
+		if (!isnull)
 		{
 			int			batchno;
 			int			bucketno;
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 27f94f9007..f394a1623c 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -1900,6 +1900,175 @@ llvm_compile_expr(ExprState *state)
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
+			case EEOP_HASHDATUM_SET_INITVAL:
+				{
+					LLVMValueRef v_initvalue;
+
+					v_initvalue = l_sizet_const(op->d.hashdatum_initvalue.init_value);
+
+					LLVMBuildStore(b, v_initvalue, v_resvaluep);
+					LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+					LLVMBuildBr(b, opblocks[opno + 1]);
+					break;
+				}
+
+			case EEOP_HASHDATUM_FIRST:
+			case EEOP_HASHDATUM_FIRST_STRICT:
+			case EEOP_HASHDATUM_NEXT32:
+			case EEOP_HASHDATUM_NEXT32_STRICT:
+				{
+					FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+					LLVMValueRef v_fcinfo;
+					LLVMValueRef v_fcinfo_isnull;
+					LLVMValueRef v_retval;
+					LLVMBasicBlockRef b_checkargnull;
+					LLVMBasicBlockRef b_ifnotnull;
+					LLVMBasicBlockRef b_ifnullblock;
+					LLVMValueRef v_argisnull;
+					LLVMValueRef v_prevhash = NULL;
+
+					/*
+					 * If not hashing the first value then we must load the
+					 * previous hash value and rotate it left one bit.
+					 */
+					if (opcode == EEOP_HASHDATUM_NEXT32 ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						LLVMValueRef v_tmp1;
+						LLVMValueRef v_tmp2;
+
+						/*
+						 * Fetch the previously hashed value from where the
+						 * EEOP_HASHDATUM_FIRST* operation stored it.
+						 */
+						v_prevhash = l_load(b, TypeSizeT, v_resvaluep,
+											"prevhash");
+
+						/*
+						 * Rotate bits left by 1 bit.  Be careful not to
+						 * overflow uint32 when working with size_t.
+						 */
+						v_tmp1 = LLVMBuildShl(b, v_prevhash, l_sizet_const(1),
+											  "");
+						v_tmp1 = LLVMBuildAnd(b, v_tmp1,
+											  l_sizet_const(0xffffffff), "");
+						v_tmp2 = LLVMBuildLShr(b, v_prevhash,
+											   l_sizet_const(31), "");
+						v_prevhash = LLVMBuildOr(b, v_tmp1, v_tmp2,
+												 "rotatedhash");
+					}
+
+					/*
+					 * Block for the actual function call, if args are
+					 * non-NULL.
+					 */
+					b_ifnotnull = l_bb_before_v(opblocks[opno + 1],
+												"b.%d.ifnotnull",
+												opno);
+
+					/* we expect the hash function to have 1 argument */
+					if (fcinfo->nargs != 1)
+						elog(ERROR, "incorrect number of function arguments");
+
+					v_fcinfo = l_ptr_const(fcinfo,
+										   l_ptr(StructFunctionCallInfoData));
+
+					b_checkargnull = l_bb_before_v(b_ifnotnull,
+												   "b.%d.isnull.0", opno);
+
+					LLVMBuildBr(b, b_checkargnull);
+
+					/*
+					 * Determine what to do if we find the argument to be
+					 * NULL.
+					 */
+					if (opcode == EEOP_HASHDATUM_FIRST_STRICT ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						b_ifnullblock = l_bb_before_v(b_ifnotnull,
+													  "b.%d.strictnull",
+													  opno);
+
+						LLVMPositionBuilderAtEnd(b, b_ifnullblock);
+
+						/*
+						 * In strict node, NULL inputs result in NULL.  Save
+						 * the NULL result and goto jumpdone.
+						 */
+						LLVMBuildStore(b, l_sbool_const(1), v_resnullp);
+						LLVMBuildStore(b, l_sizet_const(0), v_resvaluep);
+						LLVMBuildBr(b, opblocks[op->d.hashdatum.jumpdone]);
+					}
+					else
+					{
+						b_ifnullblock = l_bb_before_v(b_ifnotnull,
+													  "b.%d.null",
+													  opno);
+
+						LLVMPositionBuilderAtEnd(b, b_ifnullblock);
+
+
+						LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+
+						if (opcode == EEOP_HASHDATUM_NEXT32)
+						{
+							Assert(v_prevhash != NULL);
+
+							/*
+							 * Save the rotated hash value and skip to the
+							 * next op.
+							 */
+							LLVMBuildStore(b, v_prevhash, v_resvaluep);
+						}
+						else
+						{
+							Assert(opcode == EEOP_HASHDATUM_FIRST);
+
+							/*
+							 * Store a zero Datum when the Datum to hash is
+							 * NULL
+							 */
+							LLVMBuildStore(b, l_sizet_const(0), v_resvaluep);
+						}
+
+						LLVMBuildBr(b, opblocks[opno + 1]);
+					}
+
+					LLVMPositionBuilderAtEnd(b, b_checkargnull);
+
+					/* emit code to check if the input parameter is NULL */
+					v_argisnull = l_funcnull(b, v_fcinfo, 0);
+					LLVMBuildCondBr(b,
+									LLVMBuildICmp(b,
+												  LLVMIntEQ,
+												  v_argisnull,
+												  l_sbool_const(1),
+												  ""),
+									b_ifnullblock,
+									b_ifnotnull);
+
+					LLVMPositionBuilderAtEnd(b, b_ifnotnull);
+
+					/* call the hash function */
+					v_retval = BuildV1Call(context, b, mod, fcinfo,
+										   &v_fcinfo_isnull);
+
+					/*
+					 * For NEXT32 ops, XOR (^) the returned hash value with
+					 * the existing hash value.
+					 */
+					if (opcode == EEOP_HASHDATUM_NEXT32 ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+						v_retval = LLVMBuildXor(b, v_prevhash, v_retval,
+												"xorhash");
+
+					LLVMBuildStore(b, v_retval, v_resvaluep);
+					LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+
+					LLVMBuildBr(b, opblocks[opno + 1]);
+					break;
+				}
+
 			case EEOP_CONVERT_ROWTYPE:
 				build_EvalXFunc(b, mod, "ExecEvalConvertRowtype",
 								v_state, op, v_econtext);
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 845f3422de..eec0aa699e 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -235,6 +235,13 @@ typedef enum ExprEvalOp
 	/* evaluate a single domain CHECK constraint */
 	EEOP_DOMAIN_CHECK,
 
+	/* evaluation steps for hashing */
+	EEOP_HASHDATUM_SET_INITVAL,
+	EEOP_HASHDATUM_FIRST,
+	EEOP_HASHDATUM_FIRST_STRICT,
+	EEOP_HASHDATUM_NEXT32,
+	EEOP_HASHDATUM_NEXT32_STRICT,
+
 	/* evaluate assorted special-purpose expression types */
 	EEOP_CONVERT_ROWTYPE,
 	EEOP_SCALARARRAYOP,
@@ -558,6 +565,23 @@ typedef struct ExprEvalStep
 			ErrorSaveContext *escontext;
 		}			domaincheck;
 
+		/* for EEOP_HASH_SET_INITVAL */
+		struct
+		{
+			Datum		init_value;
+
+		}			hashdatum_initvalue;
+
+		/* for EEOP_HASHDATUM_(FIRST|NEXT32)[_STRICT] */
+		struct
+		{
+			FmgrInfo   *finfo;	/* function's lookup data */
+			FunctionCallInfo fcinfo_data;	/* arguments etc */
+			/* faster to access without additional indirection: */
+			PGFunction	fn_addr;	/* actual call address */
+			int			jumpdone;	/* jump here on null */
+		}			hashdatum;
+
 		/* for EEOP_CONVERT_ROWTYPE */
 		struct
 		{
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9770752ea3..046a7fb69b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -285,6 +285,13 @@ extern ExprState *ExecInitCheck(List *qual, PlanState *parent);
 extern List *ExecInitExprList(List *nodes, PlanState *parent);
 extern ExprState *ExecBuildAggTrans(AggState *aggstate, struct AggStatePerPhaseData *phase,
 									bool doSort, bool doHash, bool nullcheck);
+extern ExprState *ExecBuildHash32Expr(TupleDesc desc,
+									  const TupleTableSlotOps *ops,
+									  const Oid *hashfunc_oids,
+									  const List *collations,
+									  const List *hash_exprs,
+									  const bool *opstrict, PlanState *parent,
+									  uint32 init_value, bool keep_nulls);
 extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const TupleTableSlotOps *lops, const TupleTableSlotOps *rops,
 										 int numCols,
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 9197846cda..2d8ed8688c 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -313,8 +313,6 @@ typedef struct HashJoinTableData
 		dsa_pointer_atomic *shared;
 	}			buckets;
 
-	bool		keepNulls;		/* true to store unmatchable NULL tuples */
-
 	bool		skewEnabled;	/* are we using skew optimization? */
 	HashSkewBucket **skewBucket;	/* hashtable of skew buckets */
 	int			skewBucketLen;	/* size of skewBucket array (a power of 2!) */
@@ -343,16 +341,6 @@ typedef struct HashJoinTableData
 	BufFile   **innerBatchFile; /* buffered virtual temp file per batch */
 	BufFile   **outerBatchFile; /* buffered virtual temp file per batch */
 
-	/*
-	 * Info about the datatype-specific hash functions for the datatypes being
-	 * hashed. These are arrays of the same length as the number of hash join
-	 * clauses (hash keys).
-	 */
-	FmgrInfo   *outer_hashfunctions;	/* lookup data for hash functions */
-	FmgrInfo   *inner_hashfunctions;	/* lookup data for hash functions */
-	bool	   *hashStrict;		/* is each hash join operator strict? */
-	Oid		   *collations;
-
 	Size		spaceUsed;		/* memory space currently used by tuples */
 	Size		spaceAllowed;	/* upper limit for space used */
 	Size		spacePeak;		/* peak space used */
diff --git a/src/include/executor/nodeHash.h b/src/include/executor/nodeHash.h
index a95911c2fe..e4eb7bc635 100644
--- a/src/include/executor/nodeHash.h
+++ b/src/include/executor/nodeHash.h
@@ -24,8 +24,7 @@ extern Node *MultiExecHash(HashState *node);
 extern void ExecEndHash(HashState *node);
 extern void ExecReScanHash(HashState *node);
 
-extern HashJoinTable ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
-										 bool keepNulls);
+extern HashJoinTable ExecHashTableCreate(HashState *state);
 extern void ExecParallelHashTableAlloc(HashJoinTable hashtable,
 									   int batchno);
 extern void ExecHashTableDestroy(HashJoinTable hashtable);
@@ -43,12 +42,6 @@ extern void ExecParallelHashTableInsert(HashJoinTable hashtable,
 extern void ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable,
 													TupleTableSlot *slot,
 													uint32 hashvalue);
-extern bool ExecHashGetHashValue(HashJoinTable hashtable,
-								 ExprContext *econtext,
-								 List *hashkeys,
-								 bool outer_tuple,
-								 bool keep_nulls,
-								 uint32 *hashvalue);
 extern void ExecHashGetBucketAndBatch(HashJoinTable hashtable,
 									  uint32 hashvalue,
 									  int *bucketno,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 87f1519ec6..af7d8fd1e7 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2184,8 +2184,7 @@ typedef struct MergeJoinState
  *	 HashJoinState information
  *
  *		hashclauses				original form of the hashjoin condition
- *		hj_OuterHashKeys		the outer hash keys in the hashjoin condition
- *		hj_HashOperators		the join operators in the hashjoin condition
+ *		hj_OuterHash			ExprState for hashing outer keys
  *		hj_HashTable			hash table for the hashjoin
  *								(NULL if table not built yet)
  *		hj_CurHashValue			hash value for current outer tuple
@@ -2215,9 +2214,7 @@ typedef struct HashJoinState
 {
 	JoinState	js;				/* its first field is NodeTag */
 	ExprState  *hashclauses;
-	List	   *hj_OuterHashKeys;	/* list of ExprState nodes */
-	List	   *hj_HashOperators;	/* list of operator OIDs */
-	List	   *hj_Collations;
+	ExprState  *hj_OuterHash;
 	HashJoinTable hj_HashTable;
 	uint32		hj_CurHashValue;
 	int			hj_CurBucketNo;
@@ -2770,7 +2767,10 @@ typedef struct HashState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	HashJoinTable hashtable;	/* hash table for the hashjoin */
-	List	   *hashkeys;		/* list of ExprState nodes */
+	ExprState  *hash_expr;		/* ExprState to get hash value */
+
+	FmgrInfo   *skew_hashfunction;	/* lookup data for skew hash function */
+	Oid			skew_collation; /* collation to call skew_hashfunction with */
 
 	/*
 	 * In a parallelized hash join, the leader retains a pointer to the
-- 
2.34.1

#5Tels
nospam-pg-abuse@bloodgate.com
In reply to: David Rowley (#4)
Re: Speed up Hash Join by teaching ExprState about hashing

Hello David,

you wrote:

v4 patch attached. If nobody else wants to look at this then I'm
planning on pushing it soon.

Had a very brief look at this bit caught my attentioon:

+		EEO_CASE(EEOP_HASHDATUM_NEXT32_STRICT)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+			uint32		existing_hash = DatumGetUInt32(*op->resvalue);
+			uint32		hashvalue;
+
+			/* combine successive hash values by rotating */
+			existing_hash = pg_rotate_left32(existing_hash, 1);
+
+			if (fcinfo->args[0].isnull)
+			{

Is it nec. to rotate existing_hash here before checking for isnull?
Because in case of isnull, isn't the result of the rotate thrown away?

Or in other words, mnaybe this bit here can be moved to after the isnull
check:

+			/* combine successive hash values by rotating */
+			existing_hash = pg_rotate_left32(existing_hash, 1);

--
Best regards,

Tels

#6David Rowley
dgrowleyml@gmail.com
In reply to: Tels (#5)
1 attachment(s)
Re: Speed up Hash Join by teaching ExprState about hashing

Thanks for having a look.

On Sat, 17 Aug 2024 at 23:21, Tels <nospam-pg-abuse@bloodgate.com> wrote:

Is it nec. to rotate existing_hash here before checking for isnull?
Because in case of isnull, isn't the result of the rotate thrown away?

Yeah, I think that it's worthwhile moving that to after the isnull
check so as not to waste the effort. Unfortunately doing the same in
the JIT code meant copying and pasting the code that emits the
rotation code.

The attached v5 patch includes this change.

David

Attachments:

v5-0001-Speed-up-Hash-Join-by-making-ExprStates-support-h.patchapplication/octet-stream; name=v5-0001-Speed-up-Hash-Join-by-making-ExprStates-support-h.patchDownload
From 4c85d330f625b87e8bc12ac7d462bf09ff58e32c Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Fri, 10 May 2024 20:22:07 +1200
Subject: [PATCH v5] Speed up Hash Join by making ExprStates support hashing

Here we add ExprState support for obtaining a 32-bit hash value from a
list of expressions.  This allows both faster hashing and also JIT
compilation of these expressions.  This is especially useful when hash
joins have multiple join keys as the previous code called ExecEvalExpr on
each hash join key individually and that was inefficient as tuple
deformation would have only taken into account one key at a time, which
could lead to walking the tuple once for each join key.  With the new
code, we'll determine the maximum attribute required and deform the tuple
to that point only once.

Some performance tests done with this change have shown up to a 20%
performance increase of a query containing a Hash Join without JIT
compilation and up to a 26% performance increase when JIT is enabled and
optimization and inlining were performed by the JIT compiler.  The
performance increase with 1 join column was less with a 14% increase
with and without JIT.  This test was done using a fairly small hash
table and a large number of hash probes.  The increase will likely be
less with large tables, especially ones larger than L3 cache as memory
pressure is more likely to be the limiting factor there.

This commit only addresses Hash Joins, but lays expression evaluation
and JIT compilation infrastructure for other hashing needs such as Hash
Aggregate.  That will be done as a follow-up commit.

Author: David Rowley
Reviewed-by: Alexey Dvoichenkov <alexey@hyperplane.net>
Discussion: https://postgr.es/m/CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm%3Dm0L%2BGc%3DT%3DkEa4g%40mail.gmail.com
---
 src/backend/executor/execExpr.c       | 141 ++++++++++++++++++
 src/backend/executor/execExprInterp.c | 110 ++++++++++++++
 src/backend/executor/nodeHash.c       | 195 ++++++------------------
 src/backend/executor/nodeHashjoin.c   | 143 +++++++++++++++---
 src/backend/jit/llvm/llvmjit_expr.c   | 204 ++++++++++++++++++++++++++
 src/include/executor/execExpr.h       |  24 +++
 src/include/executor/executor.h       |   7 +
 src/include/executor/hashjoin.h       |  12 --
 src/include/executor/nodeHash.h       |   9 +-
 src/include/nodes/execnodes.h         |  12 +-
 10 files changed, 655 insertions(+), 202 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 66dda8e5e6..ddbf760746 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3969,6 +3969,147 @@ ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 	}
 }
 
+/*
+ * Build an ExprState that calls the given hash function(s) on the given
+ * 'hash_exprs'.  When multiple expressions are present, the hash values
+ * returned by each function are combined to produce a single hash value.
+ *
+ * desc: tuple descriptor for the to-be-hashed expressions
+ * ops: TupleTableSlotOps for the TupleDesc
+ * hashfunc_oids: Oid for each hash function to call, one for each 'hash_expr'
+ * collations: collation to use when calling the hash function.
+ * hash_expr: list of expressions to hash the value of
+ * opstrict: array corresponding to the 'hashfunc_oids' to store op_strict()
+ * parent: PlanState node that the 'hash_exprs' will be evaluated at
+ * init_value: Normally 0, but can be set to other values to seed the hash
+ * with some other value.  Using non-zero is slightly less efficient but can
+ * be useful.
+ * keep_nulls: if true, evaluation of the returned ExprState will abort early
+ * returning NULL if the given hash function is strict and the Datum to hash
+ * is null.  When set to false, any NULL input Datums are skipped.
+ */
+ExprState *
+ExecBuildHash32Expr(TupleDesc desc, const TupleTableSlotOps *ops,
+					const Oid *hashfunc_oids, const List *collations,
+					const List *hash_exprs, const bool *opstrict,
+					PlanState *parent, uint32 init_value, bool keep_nulls)
+{
+	ExprState  *state = makeNode(ExprState);
+	ExprEvalStep scratch = {0};
+	List	   *adjust_jumps = NIL;
+	ListCell   *lc;
+	ListCell   *lc2;
+	intptr_t	strict_opcode;
+	intptr_t	opcode;
+
+	Assert(list_length(hash_exprs) == list_length(collations));
+
+	state->parent = parent;
+
+	/* Insert setup steps as needed. */
+	ExecCreateExprSetupSteps(state, (Node *) hash_exprs);
+
+	if (init_value == 0)
+	{
+		/*
+		 * No initial value, so we can assign the result of the hash function
+		 * for the first hash_expr without having to concern ourselves with
+		 * combining the result with any initial value.
+		 */
+		strict_opcode = EEOP_HASHDATUM_FIRST_STRICT;
+		opcode = EEOP_HASHDATUM_FIRST;
+	}
+	else
+	{
+		/* Set up operation to set the initial value. */
+		scratch.opcode = EEOP_HASHDATUM_SET_INITVAL;
+		scratch.d.hashdatum_initvalue.init_value = UInt32GetDatum(init_value);
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+
+		ExprEvalPushStep(state, &scratch);
+
+		/*
+		 * When using an initial value use the NEXT32/NEXT32_STRICT ops as the
+		 * FIRST/FIRST_STRICT ops would overwrite the stored initial value.
+		 */
+		strict_opcode = EEOP_HASHDATUM_NEXT32_STRICT;
+		opcode = EEOP_HASHDATUM_NEXT32;
+	}
+
+	forboth(lc, hash_exprs, lc2, collations)
+	{
+		Expr	   *expr = (Expr *) lfirst(lc);
+		FmgrInfo   *finfo;
+		FunctionCallInfo fcinfo;
+		int			i = foreach_current_index(lc);
+		Oid			funcid;
+		Oid			inputcollid = lfirst_oid(lc2);
+
+		funcid = hashfunc_oids[i];
+
+		/* Allocate hash function lookup data. */
+		finfo = palloc0(sizeof(FmgrInfo));
+		fcinfo = palloc0(SizeForFunctionCallInfo(1));
+
+		fmgr_info(funcid, finfo);
+
+		/*
+		 * Build the steps to evaluate the hash function's argument have it so
+		 * the value of that is stored in the 0th argument of the hash func.
+		 */
+		ExecInitExprRec(expr,
+						state,
+						&fcinfo->args[0].value,
+						&fcinfo->args[0].isnull);
+
+		scratch.resvalue = &state->resvalue;
+		scratch.resnull = &state->resnull;
+
+		/* Initialize function call parameter structure too */
+		InitFunctionCallInfoData(*fcinfo, finfo, 1, inputcollid, NULL, NULL);
+
+		scratch.d.hashdatum.finfo = finfo;
+		scratch.d.hashdatum.fcinfo_data = fcinfo;
+		scratch.d.hashdatum.fn_addr = finfo->fn_addr;
+
+		scratch.opcode = opstrict[i] && !keep_nulls ? strict_opcode : opcode;
+		scratch.d.hashdatum.jumpdone = -1;
+
+		ExprEvalPushStep(state, &scratch);
+		adjust_jumps = lappend_int(adjust_jumps, state->steps_len - 1);
+
+		/*
+		 * For subsequent keys we must combine the hash value with the
+		 * previous hashes.
+		 */
+		strict_opcode = EEOP_HASHDATUM_NEXT32_STRICT;
+		opcode = EEOP_HASHDATUM_NEXT32;
+	}
+
+	/* adjust jump targets */
+	foreach(lc, adjust_jumps)
+	{
+		ExprEvalStep *as = &state->steps[lfirst_int(lc)];
+
+		Assert(as->opcode == EEOP_HASHDATUM_FIRST ||
+			   as->opcode == EEOP_HASHDATUM_FIRST_STRICT ||
+			   as->opcode == EEOP_HASHDATUM_NEXT32 ||
+			   as->opcode == EEOP_HASHDATUM_NEXT32_STRICT);
+		Assert(as->d.hashdatum.jumpdone == -1);
+		as->d.hashdatum.jumpdone = state->steps_len;
+	}
+
+	scratch.resvalue = NULL;
+	scratch.resnull = NULL;
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
+
 /*
  * Build equality expression that can be evaluated using ExecQual(), returning
  * true if the expression context's inner/outer tuple are NOT DISTINCT. I.e
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ea47c4d6f9..22d008acbd 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -477,6 +477,11 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&&CASE_EEOP_DOMAIN_TESTVAL,
 		&&CASE_EEOP_DOMAIN_NOTNULL,
 		&&CASE_EEOP_DOMAIN_CHECK,
+		&&CASE_EEOP_HASHDATUM_SET_INITVAL,
+		&&CASE_EEOP_HASHDATUM_FIRST,
+		&&CASE_EEOP_HASHDATUM_FIRST_STRICT,
+		&&CASE_EEOP_HASHDATUM_NEXT32,
+		&&CASE_EEOP_HASHDATUM_NEXT32_STRICT,
 		&&CASE_EEOP_CONVERT_ROWTYPE,
 		&&CASE_EEOP_SCALARARRAYOP,
 		&&CASE_EEOP_HASHED_SCALARARRAYOP,
@@ -1543,6 +1548,111 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 			EEO_NEXT();
 		}
 
+		EEO_CASE(EEOP_HASHDATUM_SET_INITVAL)
+		{
+			*op->resvalue = op->d.hashdatum_initvalue.init_value;
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_FIRST)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			/*
+			 * Save the Datum on non-null inputs, otherwise store 0 so that
+			 * any NEXT32 operations combine with an initialized value.
+			 */
+			if (!fcinfo->args[0].isnull)
+				*op->resvalue = op->d.hashdatum.fn_addr(fcinfo);
+			else
+				*op->resvalue = (Datum) 0;
+
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_FIRST_STRICT)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			if (fcinfo->args[0].isnull)
+			{
+				/*
+				 * With strict we have the expression return NULL instead of
+				 * ignoring NULL input values.  We've nothing more to do after
+				 * finding a NULL.
+				 */
+				*op->resnull = true;
+				*op->resvalue = (Datum) 0;
+				EEO_JUMP(op->d.hashdatum.jumpdone);
+			}
+
+			/* execute the hash function and save the resulting value */
+			*op->resvalue = op->d.hashdatum.fn_addr(fcinfo);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_NEXT32)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+			uint32		existing_hash = DatumGetUInt32(*op->resvalue);
+
+			/* combine successive hash values by rotating */
+			existing_hash = pg_rotate_left32(existing_hash, 1);
+
+			/* leave the hash value alone on NULL inputs */
+			if (!fcinfo->args[0].isnull)
+			{
+				uint32		hashvalue;
+
+				/* execute hash func and combine with previous hash value */
+				hashvalue = DatumGetUInt32(op->d.hashdatum.fn_addr(fcinfo));
+				existing_hash = existing_hash ^ hashvalue;
+			}
+
+			*op->resvalue = UInt32GetDatum(existing_hash);
+			*op->resnull = false;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_HASHDATUM_NEXT32_STRICT)
+		{
+			FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+
+			if (fcinfo->args[0].isnull)
+			{
+				/*
+				 * With strict we have the expression return NULL instead of
+				 * ignoring NULL input values.  We've nothing more to do after
+				 * finding a NULL.
+				 */
+				*op->resnull = true;
+				*op->resvalue = (Datum) 0;
+				EEO_JUMP(op->d.hashdatum.jumpdone);
+			}
+			else
+			{
+				uint32		existing_hash = DatumGetUInt32(*op->resvalue);
+				uint32		hashvalue;
+
+				/* combine successive hash values by rotating */
+				existing_hash = pg_rotate_left32(existing_hash, 1);
+
+				/* execute hash func and combine with previous hash value */
+				hashvalue = DatumGetUInt32(op->d.hashdatum.fn_addr(fcinfo));
+				*op->resvalue = UInt32GetDatum(existing_hash ^ hashvalue);
+				*op->resnull = false;
+			}
+
+			EEO_NEXT();
+		}
+
 		EEO_CASE(EEOP_XMLEXPR)
 		{
 			/* too complex for an inline implementation */
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 61480733a1..86b07f23cb 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -47,7 +47,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
 static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
-static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
+static void ExecHashBuildSkewHash(HashState *hashstate,
+								  HashJoinTable hashtable, Hash *node,
 								  int mcvsToUse);
 static void ExecHashSkewTableInsert(HashJoinTable hashtable,
 									TupleTableSlot *slot,
@@ -138,11 +139,9 @@ static void
 MultiExecPrivateHash(HashState *node)
 {
 	PlanState  *outerNode;
-	List	   *hashkeys;
 	HashJoinTable hashtable;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
-	uint32		hashvalue;
 
 	/*
 	 * get state info from node
@@ -153,7 +152,6 @@ MultiExecPrivateHash(HashState *node)
 	/*
 	 * set expression context
 	 */
-	hashkeys = node->hashkeys;
 	econtext = node->ps.ps_ExprContext;
 
 	/*
@@ -162,17 +160,30 @@ MultiExecPrivateHash(HashState *node)
 	 */
 	for (;;)
 	{
+		bool		isnull;
+		Datum		hashdatum;
+
 		slot = ExecProcNode(outerNode);
 		if (TupIsNull(slot))
 			break;
 		/* We have to compute the hash value */
 		econtext->ecxt_outertuple = slot;
-		if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
-								 false, hashtable->keepNulls,
-								 &hashvalue))
+
+		ResetExprContext(econtext);
+
+		hashdatum = ExecEvalExprSwitchContext(node->hash_expr, econtext,
+											  &isnull);
+
+		if (!isnull)
 		{
+			uint32		hashvalue = DatumGetUInt32(hashdatum);
 			int			bucketNumber;
 
+#if USE_ASSERT_CHECKING
+			/* XXX just for testing. Not for commit */
+			elog(DEBUG1, "hashdatum = " UINT64_FORMAT, (uint64) hashdatum);
+#endif
+
 			bucketNumber = ExecHashGetSkewBucket(hashtable, hashvalue);
 			if (bucketNumber != INVALID_SKEW_BUCKET_NO)
 			{
@@ -215,7 +226,6 @@ MultiExecParallelHash(HashState *node)
 {
 	ParallelHashJoinState *pstate;
 	PlanState  *outerNode;
-	List	   *hashkeys;
 	HashJoinTable hashtable;
 	TupleTableSlot *slot;
 	ExprContext *econtext;
@@ -232,7 +242,6 @@ MultiExecParallelHash(HashState *node)
 	/*
 	 * set expression context
 	 */
-	hashkeys = node->hashkeys;
 	econtext = node->ps.ps_ExprContext;
 
 	/*
@@ -279,13 +288,20 @@ MultiExecParallelHash(HashState *node)
 			ExecParallelHashTableSetCurrentBatch(hashtable, 0);
 			for (;;)
 			{
+				bool		isnull;
+
 				slot = ExecProcNode(outerNode);
 				if (TupIsNull(slot))
 					break;
 				econtext->ecxt_outertuple = slot;
-				if (ExecHashGetHashValue(hashtable, econtext, hashkeys,
-										 false, hashtable->keepNulls,
-										 &hashvalue))
+
+				ResetExprContext(econtext);
+
+				hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(node->hash_expr,
+																	 econtext,
+																	 &isnull));
+
+				if (!isnull)
 					ExecParallelHashTableInsert(hashtable, slot, hashvalue);
 				hashtable->partialTuples++;
 			}
@@ -371,8 +387,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
 	hashstate->ps.plan = (Plan *) node;
 	hashstate->ps.state = estate;
 	hashstate->ps.ExecProcNode = ExecHash;
+	/* delay building hashtable until ExecHashTableCreate() in executor run */
 	hashstate->hashtable = NULL;
-	hashstate->hashkeys = NIL;	/* will be set by parent HashJoin */
 
 	/*
 	 * Miscellaneous initialization
@@ -393,12 +409,16 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
 	ExecInitResultTupleSlotTL(&hashstate->ps, &TTSOpsMinimalTuple);
 	hashstate->ps.ps_ProjInfo = NULL;
 
+	Assert(node->plan.qual == NIL);
+
 	/*
-	 * initialize child expressions
+	 * Delay initialization of hash_expr until ExecInitHashJoin().  We cannot
+	 * build the ExprState here as we don't yet know the join type we're going
+	 * to be hashing values for and we need to know that before calling
+	 * ExecBuildHash32Expr as the keep_nulls parameter depends on the join
+	 * type.
 	 */
-	Assert(node->plan.qual == NIL);
-	hashstate->hashkeys =
-		ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
+	hashstate->hash_expr = NULL;
 
 	return hashstate;
 }
@@ -429,7 +449,7 @@ ExecEndHash(HashState *node)
  * ----------------------------------------------------------------
  */
 HashJoinTable
-ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations, bool keepNulls)
+ExecHashTableCreate(HashState *state)
 {
 	Hash	   *node;
 	HashJoinTable hashtable;
@@ -440,10 +460,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	double		rows;
 	int			num_skew_mcvs;
 	int			log2_nbuckets;
-	int			nkeys;
-	int			i;
-	ListCell   *ho;
-	ListCell   *hc;
 	MemoryContext oldcxt;
 
 	/*
@@ -487,7 +503,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	hashtable->log2_nbuckets = log2_nbuckets;
 	hashtable->log2_nbuckets_optimal = log2_nbuckets;
 	hashtable->buckets.unshared = NULL;
-	hashtable->keepNulls = keepNulls;
 	hashtable->skewEnabled = false;
 	hashtable->skewBucket = NULL;
 	hashtable->skewBucketLen = 0;
@@ -540,32 +555,6 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 
 	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
 
-	/*
-	 * Get info about the hash functions to be used for each hash key. Also
-	 * remember whether the join operators are strict.
-	 */
-	nkeys = list_length(hashOperators);
-	hashtable->outer_hashfunctions = palloc_array(FmgrInfo, nkeys);
-	hashtable->inner_hashfunctions = palloc_array(FmgrInfo, nkeys);
-	hashtable->hashStrict = palloc_array(bool, nkeys);
-	hashtable->collations = palloc_array(Oid, nkeys);
-	i = 0;
-	forboth(ho, hashOperators, hc, hashCollations)
-	{
-		Oid			hashop = lfirst_oid(ho);
-		Oid			left_hashfn;
-		Oid			right_hashfn;
-
-		if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
-			elog(ERROR, "could not find hash function for hash operator %u",
-				 hashop);
-		fmgr_info(left_hashfn, &hashtable->outer_hashfunctions[i]);
-		fmgr_info(right_hashfn, &hashtable->inner_hashfunctions[i]);
-		hashtable->hashStrict[i] = op_strict(hashop);
-		hashtable->collations[i] = lfirst_oid(hc);
-		i++;
-	}
-
 	if (nbatch > 1 && hashtable->parallel_state == NULL)
 	{
 		MemoryContext oldctx;
@@ -652,7 +641,7 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 		 * it.)
 		 */
 		if (nbatch > 1)
-			ExecHashBuildSkewHash(hashtable, node, num_skew_mcvs);
+			ExecHashBuildSkewHash(state, hashtable, node, num_skew_mcvs);
 
 		MemoryContextSwitchTo(oldcxt);
 	}
@@ -1803,103 +1792,6 @@ ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable,
 		heap_free_minimal_tuple(tuple);
 }
 
-/*
- * ExecHashGetHashValue
- *		Compute the hash value for a tuple
- *
- * The tuple to be tested must be in econtext->ecxt_outertuple (thus Vars in
- * the hashkeys expressions need to have OUTER_VAR as varno). If outer_tuple
- * is false (meaning it's the HashJoin's inner node, Hash), econtext,
- * hashkeys, and slot need to be from Hash, with hashkeys/slot referencing and
- * being suitable for tuples from the node below the Hash. Conversely, if
- * outer_tuple is true, econtext is from HashJoin, and hashkeys/slot need to
- * be appropriate for tuples from HashJoin's outer node.
- *
- * A true result means the tuple's hash value has been successfully computed
- * and stored at *hashvalue.  A false result means the tuple cannot match
- * because it contains a null attribute, and hence it should be discarded
- * immediately.  (If keep_nulls is true then false is never returned.)
- */
-bool
-ExecHashGetHashValue(HashJoinTable hashtable,
-					 ExprContext *econtext,
-					 List *hashkeys,
-					 bool outer_tuple,
-					 bool keep_nulls,
-					 uint32 *hashvalue)
-{
-	uint32		hashkey = 0;
-	FmgrInfo   *hashfunctions;
-	ListCell   *hk;
-	int			i = 0;
-	MemoryContext oldContext;
-
-	/*
-	 * We reset the eval context each time to reclaim any memory leaked in the
-	 * hashkey expressions.
-	 */
-	ResetExprContext(econtext);
-
-	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-
-	if (outer_tuple)
-		hashfunctions = hashtable->outer_hashfunctions;
-	else
-		hashfunctions = hashtable->inner_hashfunctions;
-
-	foreach(hk, hashkeys)
-	{
-		ExprState  *keyexpr = (ExprState *) lfirst(hk);
-		Datum		keyval;
-		bool		isNull;
-
-		/* combine successive hashkeys by rotating */
-		hashkey = pg_rotate_left32(hashkey, 1);
-
-		/*
-		 * Get the join attribute value of the tuple
-		 */
-		keyval = ExecEvalExpr(keyexpr, econtext, &isNull);
-
-		/*
-		 * If the attribute is NULL, and the join operator is strict, then
-		 * this tuple cannot pass the join qual so we can reject it
-		 * immediately (unless we're scanning the outside of an outer join, in
-		 * which case we must not reject it).  Otherwise we act like the
-		 * hashcode of NULL is zero (this will support operators that act like
-		 * IS NOT DISTINCT, though not any more-random behavior).  We treat
-		 * the hash support function as strict even if the operator is not.
-		 *
-		 * Note: currently, all hashjoinable operators must be strict since
-		 * the hash index AM assumes that.  However, it takes so little extra
-		 * code here to allow non-strict that we may as well do it.
-		 */
-		if (isNull)
-		{
-			if (hashtable->hashStrict[i] && !keep_nulls)
-			{
-				MemoryContextSwitchTo(oldContext);
-				return false;	/* cannot match */
-			}
-			/* else, leave hashkey unmodified, equivalent to hashcode 0 */
-		}
-		else
-		{
-			/* Compute the hash function */
-			uint32		hkey;
-
-			hkey = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[i], hashtable->collations[i], keyval));
-			hashkey ^= hkey;
-		}
-
-		i++;
-	}
-
-	MemoryContextSwitchTo(oldContext);
-
-	*hashvalue = hashkey;
-	return true;
-}
 
 /*
  * ExecHashGetBucketAndBatch
@@ -2372,7 +2264,8 @@ ExecReScanHash(HashState *node)
  *		based on available memory.
  */
 static void
-ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
+ExecHashBuildSkewHash(HashState *hashstate, HashJoinTable hashtable,
+					  Hash *node, int mcvsToUse)
 {
 	HeapTupleData *statsTuple;
 	AttStatsSlot sslot;
@@ -2400,7 +2293,6 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 	{
 		double		frac;
 		int			nbuckets;
-		FmgrInfo   *hashfunctions;
 		int			i;
 
 		if (mcvsToUse > sslot.nvalues)
@@ -2468,15 +2360,14 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
 		 * ExecHashRemoveNextSkewBucket) and we want the least common MCVs to
 		 * be removed first.
 		 */
-		hashfunctions = hashtable->outer_hashfunctions;
 
 		for (i = 0; i < mcvsToUse; i++)
 		{
 			uint32		hashvalue;
 			int			bucket;
 
-			hashvalue = DatumGetUInt32(FunctionCall1Coll(&hashfunctions[0],
-														 hashtable->collations[0],
+			hashvalue = DatumGetUInt32(FunctionCall1Coll(hashstate->skew_hashfunction,
+														 hashstate->skew_collation,
 														 sslot.values[i]));
 
 			/*
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5429e68734..2f7170604d 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -169,6 +169,7 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "utils/lsyscache.h"
 #include "utils/sharedtuplestore.h"
 #include "utils/wait_event.h"
 
@@ -331,10 +332,7 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
 				 * whoever gets here first will create the hash table and any
 				 * later arrivals will merely attach to it.
 				 */
-				hashtable = ExecHashTableCreate(hashNode,
-												node->hj_HashOperators,
-												node->hj_Collations,
-												HJ_FILL_INNER(node));
+				hashtable = ExecHashTableCreate(hashNode);
 				node->hj_HashTable = hashtable;
 
 				/*
@@ -820,9 +818,96 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	 */
 	{
 		HashState  *hashstate = (HashState *) innerPlanState(hjstate);
+		Hash	   *hash = (Hash *) hashstate->ps.plan;
 		TupleTableSlot *slot = hashstate->ps.ps_ResultTupleSlot;
+		Oid		   *outer_hashfuncid;
+		Oid		   *inner_hashfuncid;
+		bool	   *hash_strict;
+		ListCell   *lc;
+		int			nkeys;
+
 
 		hjstate->hj_HashTupleSlot = slot;
+
+		/*
+		 * Build ExprStates to obtain hash values for either side of the join.
+		 * This must be done here as ExecBuildHash32Expr needs to know how to
+		 * handle NULL inputs and the required handling of that depends on the
+		 * jointype.  We don't know the join type in ExecInitHash() and we
+		 * must build the ExprStates before ExecHashTableCreate() so we
+		 * properly attribute any SubPlans that exist in the hash expressions
+		 * to the correct PlanState.
+		 */
+		nkeys = list_length(node->hashoperators);
+
+		outer_hashfuncid = palloc_array(Oid, nkeys);
+		inner_hashfuncid = palloc_array(Oid, nkeys);
+		hash_strict = palloc_array(bool, nkeys);
+
+		/*
+		 * Determine the hash function for each side of the join for the given
+		 * hash operator.
+		 */
+		foreach(lc, node->hashoperators)
+		{
+			Oid			hashop = lfirst_oid(lc);
+			int			i = foreach_current_index(lc);
+
+			if (!get_op_hash_functions(hashop,
+									   &outer_hashfuncid[i],
+									   &inner_hashfuncid[i]))
+				elog(ERROR,
+					 "could not find hash function for hash operator %u",
+					 hashop);
+			hash_strict[i] = op_strict(hashop);
+		}
+
+		/*
+		 * Build an ExprState to generate the hash value for the expressions
+		 * on the outer of the join.  This ExprState must finish generating
+		 * the hash value when HJ_FILL_OUTER() is true.  Otherwise,
+		 * ExecBuildHash32Expr will set up the ExprState to abort early if it
+		 * finds a NULL.  In these cases, we don't need to store these tuples
+		 * in the hash table as the jointype does not require it.
+		 */
+		hjstate->hj_OuterHash =
+			ExecBuildHash32Expr(hjstate->js.ps.ps_ResultTupleDesc,
+								hjstate->js.ps.resultops,
+								outer_hashfuncid,
+								node->hashcollations,
+								node->hashkeys,
+								hash_strict,
+								&hjstate->js.ps,
+								0,
+								HJ_FILL_OUTER(hjstate));
+
+		/* As above, but for the inner side of the join */
+		hashstate->hash_expr =
+			ExecBuildHash32Expr(hashstate->ps.ps_ResultTupleDesc,
+								hashstate->ps.resultops,
+								inner_hashfuncid,
+								node->hashcollations,
+								hash->hashkeys,
+								hash_strict,
+								&hashstate->ps,
+								0,
+								HJ_FILL_INNER(hjstate));
+
+		/*
+		 * Set up the skew table hash function while we have a record of the
+		 * first key's hash function Oid.
+		 */
+		if (OidIsValid(hash->skewTable))
+		{
+			hashstate->skew_hashfunction = palloc0(sizeof(FmgrInfo));
+			hashstate->skew_collation = linitial_oid(node->hashcollations);
+			fmgr_info(outer_hashfuncid[0], hashstate->skew_hashfunction);
+		}
+
+		/* no need to keep these */
+		pfree(outer_hashfuncid);
+		pfree(inner_hashfuncid);
+		pfree(hash_strict);
 	}
 
 	/*
@@ -846,11 +931,6 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 	hjstate->hj_CurSkewBucketNo = INVALID_SKEW_BUCKET_NO;
 	hjstate->hj_CurTuple = NULL;
 
-	hjstate->hj_OuterHashKeys = ExecInitExprList(node->hashkeys,
-												 (PlanState *) hjstate);
-	hjstate->hj_HashOperators = node->hashoperators;
-	hjstate->hj_Collations = node->hashcollations;
-
 	hjstate->hj_JoinState = HJ_BUILD_HASHTABLE;
 	hjstate->hj_MatchedOuter = false;
 	hjstate->hj_OuterNotEmpty = false;
@@ -918,17 +998,22 @@ ExecHashJoinOuterGetTuple(PlanState *outerNode,
 
 		while (!TupIsNull(slot))
 		{
+			bool		isnull;
+
 			/*
 			 * We have to compute the tuple's hash value.
 			 */
 			ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
 
 			econtext->ecxt_outertuple = slot;
-			if (ExecHashGetHashValue(hashtable, econtext,
-									 hjstate->hj_OuterHashKeys,
-									 true,	/* outer tuple */
-									 HJ_FILL_OUTER(hjstate),
-									 hashvalue))
+
+			ResetExprContext(econtext);
+
+			*hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+																  econtext,
+																  &isnull));
+
+			if (!isnull)
 			{
 				/* remember outer relation is not empty for possible rescan */
 				hjstate->hj_OuterNotEmpty = true;
@@ -989,14 +1074,19 @@ ExecParallelHashJoinOuterGetTuple(PlanState *outerNode,
 
 		while (!TupIsNull(slot))
 		{
+			bool		isnull;
+
 			ExprContext *econtext = hjstate->js.ps.ps_ExprContext;
 
 			econtext->ecxt_outertuple = slot;
-			if (ExecHashGetHashValue(hashtable, econtext,
-									 hjstate->hj_OuterHashKeys,
-									 true,	/* outer tuple */
-									 HJ_FILL_OUTER(hjstate),
-									 hashvalue))
+
+			ResetExprContext(econtext);
+
+			*hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+																  econtext,
+																  &isnull));
+
+			if (!isnull)
 				return slot;
 
 			/*
@@ -1518,15 +1608,20 @@ ExecParallelHashJoinPartitionOuter(HashJoinState *hjstate)
 	/* Execute outer plan, writing all tuples to shared tuplestores. */
 	for (;;)
 	{
+		bool		isnull;
+
 		slot = ExecProcNode(outerState);
 		if (TupIsNull(slot))
 			break;
 		econtext->ecxt_outertuple = slot;
-		if (ExecHashGetHashValue(hashtable, econtext,
-								 hjstate->hj_OuterHashKeys,
-								 true,	/* outer tuple */
-								 HJ_FILL_OUTER(hjstate),
-								 &hashvalue))
+
+		ResetExprContext(econtext);
+
+		hashvalue = DatumGetUInt32(ExecEvalExprSwitchContext(hjstate->hj_OuterHash,
+															 econtext,
+															 &isnull));
+
+		if (!isnull)
 		{
 			int			batchno;
 			int			bucketno;
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 27f94f9007..48ccdb942a 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -1900,6 +1900,210 @@ llvm_compile_expr(ExprState *state)
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
+			case EEOP_HASHDATUM_SET_INITVAL:
+				{
+					LLVMValueRef v_initvalue;
+
+					v_initvalue = l_sizet_const(op->d.hashdatum_initvalue.init_value);
+
+					LLVMBuildStore(b, v_initvalue, v_resvaluep);
+					LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+					LLVMBuildBr(b, opblocks[opno + 1]);
+					break;
+				}
+
+			case EEOP_HASHDATUM_FIRST:
+			case EEOP_HASHDATUM_FIRST_STRICT:
+			case EEOP_HASHDATUM_NEXT32:
+			case EEOP_HASHDATUM_NEXT32_STRICT:
+				{
+					FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+					LLVMValueRef v_fcinfo;
+					LLVMValueRef v_fcinfo_isnull;
+					LLVMValueRef v_retval;
+					LLVMBasicBlockRef b_checkargnull;
+					LLVMBasicBlockRef b_ifnotnull;
+					LLVMBasicBlockRef b_ifnullblock;
+					LLVMValueRef v_argisnull;
+					LLVMValueRef v_prevhash = NULL;
+
+					/*
+					 * When performing the next hash and not in strict mode we
+					 * perform a rotation of the previously stored hash value
+					 * before doing the NULL check.  We want to do this even
+					 * when we receive a NULL Datum to hash.  In strict mode,
+					 * we do this after the NULL check so as not to waste the
+					 * effort of rotating the bits when we're going to throw
+					 * away the hash value and return NULL.
+					 */
+					if (opcode == EEOP_HASHDATUM_NEXT32)
+					{
+						LLVMValueRef v_tmp1;
+						LLVMValueRef v_tmp2;
+
+						/*
+						 * Fetch the previously hashed value from where the
+						 * EEOP_HASHDATUM_FIRST operation stored it.
+						 */
+						v_prevhash = l_load(b, TypeSizeT, v_resvaluep,
+											"prevhash");
+
+						/*
+						 * Rotate bits left by 1 bit.  Be careful not to
+						 * overflow uint32 when working with size_t.
+						 */
+						v_tmp1 = LLVMBuildShl(b, v_prevhash, l_sizet_const(1),
+											  "");
+						v_tmp1 = LLVMBuildAnd(b, v_tmp1,
+											  l_sizet_const(0xffffffff), "");
+						v_tmp2 = LLVMBuildLShr(b, v_prevhash,
+											   l_sizet_const(31), "");
+						v_prevhash = LLVMBuildOr(b, v_tmp1, v_tmp2,
+												 "rotatedhash");
+					}
+
+					/*
+					 * Block for the actual function call, if args are
+					 * non-NULL.
+					 */
+					b_ifnotnull = l_bb_before_v(opblocks[opno + 1],
+												"b.%d.ifnotnull",
+												opno);
+
+					/* we expect the hash function to have 1 argument */
+					if (fcinfo->nargs != 1)
+						elog(ERROR, "incorrect number of function arguments");
+
+					v_fcinfo = l_ptr_const(fcinfo,
+										   l_ptr(StructFunctionCallInfoData));
+
+					b_checkargnull = l_bb_before_v(b_ifnotnull,
+												   "b.%d.isnull.0", opno);
+
+					LLVMBuildBr(b, b_checkargnull);
+
+					/*
+					 * Determine what to do if we find the argument to be
+					 * NULL.
+					 */
+					if (opcode == EEOP_HASHDATUM_FIRST_STRICT ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						b_ifnullblock = l_bb_before_v(b_ifnotnull,
+													  "b.%d.strictnull",
+													  opno);
+
+						LLVMPositionBuilderAtEnd(b, b_ifnullblock);
+
+						/*
+						 * In strict node, NULL inputs result in NULL.  Save
+						 * the NULL result and goto jumpdone.
+						 */
+						LLVMBuildStore(b, l_sbool_const(1), v_resnullp);
+						LLVMBuildStore(b, l_sizet_const(0), v_resvaluep);
+						LLVMBuildBr(b, opblocks[op->d.hashdatum.jumpdone]);
+					}
+					else
+					{
+						b_ifnullblock = l_bb_before_v(b_ifnotnull,
+													  "b.%d.null",
+													  opno);
+
+						LLVMPositionBuilderAtEnd(b, b_ifnullblock);
+
+
+						LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+
+						if (opcode == EEOP_HASHDATUM_NEXT32)
+						{
+							Assert(v_prevhash != NULL);
+
+							/*
+							 * Save the rotated hash value and skip to the
+							 * next op.
+							 */
+							LLVMBuildStore(b, v_prevhash, v_resvaluep);
+						}
+						else
+						{
+							Assert(opcode == EEOP_HASHDATUM_FIRST);
+
+							/*
+							 * Store a zero Datum when the Datum to hash is
+							 * NULL
+							 */
+							LLVMBuildStore(b, l_sizet_const(0), v_resvaluep);
+						}
+
+						LLVMBuildBr(b, opblocks[opno + 1]);
+					}
+
+					LLVMPositionBuilderAtEnd(b, b_checkargnull);
+
+					/* emit code to check if the input parameter is NULL */
+					v_argisnull = l_funcnull(b, v_fcinfo, 0);
+					LLVMBuildCondBr(b,
+									LLVMBuildICmp(b,
+												  LLVMIntEQ,
+												  v_argisnull,
+												  l_sbool_const(1),
+												  ""),
+									b_ifnullblock,
+									b_ifnotnull);
+
+					LLVMPositionBuilderAtEnd(b, b_ifnotnull);
+
+					/*
+					 * Rotate the previously stored hash value when performing
+					 * NEXT32 in strict mode.  In non-strict mode we already
+					 * did this before checking for NULLs.
+					 */
+					if (opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+					{
+						LLVMValueRef v_tmp1;
+						LLVMValueRef v_tmp2;
+
+						/*
+						 * Fetch the previously hashed value from where the
+						 * EEOP_HASHDATUM_FIRST_STRICT operation stored it.
+						 */
+						v_prevhash = l_load(b, TypeSizeT, v_resvaluep,
+											"prevhash");
+
+						/*
+						 * Rotate bits left by 1 bit.  Be careful not to
+						 * overflow uint32 when working with size_t.
+						 */
+						v_tmp1 = LLVMBuildShl(b, v_prevhash, l_sizet_const(1),
+											  "");
+						v_tmp1 = LLVMBuildAnd(b, v_tmp1,
+											  l_sizet_const(0xffffffff), "");
+						v_tmp2 = LLVMBuildLShr(b, v_prevhash,
+											   l_sizet_const(31), "");
+						v_prevhash = LLVMBuildOr(b, v_tmp1, v_tmp2,
+												 "rotatedhash");
+					}
+
+					/* call the hash function */
+					v_retval = BuildV1Call(context, b, mod, fcinfo,
+										   &v_fcinfo_isnull);
+
+					/*
+					 * For NEXT32 ops, XOR (^) the returned hash value with
+					 * the existing hash value.
+					 */
+					if (opcode == EEOP_HASHDATUM_NEXT32 ||
+						opcode == EEOP_HASHDATUM_NEXT32_STRICT)
+						v_retval = LLVMBuildXor(b, v_prevhash, v_retval,
+												"xorhash");
+
+					LLVMBuildStore(b, v_retval, v_resvaluep);
+					LLVMBuildStore(b, l_sbool_const(0), v_resnullp);
+
+					LLVMBuildBr(b, opblocks[opno + 1]);
+					break;
+				}
+
 			case EEOP_CONVERT_ROWTYPE:
 				build_EvalXFunc(b, mod, "ExecEvalConvertRowtype",
 								v_state, op, v_econtext);
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 845f3422de..eec0aa699e 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -235,6 +235,13 @@ typedef enum ExprEvalOp
 	/* evaluate a single domain CHECK constraint */
 	EEOP_DOMAIN_CHECK,
 
+	/* evaluation steps for hashing */
+	EEOP_HASHDATUM_SET_INITVAL,
+	EEOP_HASHDATUM_FIRST,
+	EEOP_HASHDATUM_FIRST_STRICT,
+	EEOP_HASHDATUM_NEXT32,
+	EEOP_HASHDATUM_NEXT32_STRICT,
+
 	/* evaluate assorted special-purpose expression types */
 	EEOP_CONVERT_ROWTYPE,
 	EEOP_SCALARARRAYOP,
@@ -558,6 +565,23 @@ typedef struct ExprEvalStep
 			ErrorSaveContext *escontext;
 		}			domaincheck;
 
+		/* for EEOP_HASH_SET_INITVAL */
+		struct
+		{
+			Datum		init_value;
+
+		}			hashdatum_initvalue;
+
+		/* for EEOP_HASHDATUM_(FIRST|NEXT32)[_STRICT] */
+		struct
+		{
+			FmgrInfo   *finfo;	/* function's lookup data */
+			FunctionCallInfo fcinfo_data;	/* arguments etc */
+			/* faster to access without additional indirection: */
+			PGFunction	fn_addr;	/* actual call address */
+			int			jumpdone;	/* jump here on null */
+		}			hashdatum;
+
 		/* for EEOP_CONVERT_ROWTYPE */
 		struct
 		{
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9770752ea3..046a7fb69b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -285,6 +285,13 @@ extern ExprState *ExecInitCheck(List *qual, PlanState *parent);
 extern List *ExecInitExprList(List *nodes, PlanState *parent);
 extern ExprState *ExecBuildAggTrans(AggState *aggstate, struct AggStatePerPhaseData *phase,
 									bool doSort, bool doHash, bool nullcheck);
+extern ExprState *ExecBuildHash32Expr(TupleDesc desc,
+									  const TupleTableSlotOps *ops,
+									  const Oid *hashfunc_oids,
+									  const List *collations,
+									  const List *hash_exprs,
+									  const bool *opstrict, PlanState *parent,
+									  uint32 init_value, bool keep_nulls);
 extern ExprState *ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 										 const TupleTableSlotOps *lops, const TupleTableSlotOps *rops,
 										 int numCols,
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 9197846cda..2d8ed8688c 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -313,8 +313,6 @@ typedef struct HashJoinTableData
 		dsa_pointer_atomic *shared;
 	}			buckets;
 
-	bool		keepNulls;		/* true to store unmatchable NULL tuples */
-
 	bool		skewEnabled;	/* are we using skew optimization? */
 	HashSkewBucket **skewBucket;	/* hashtable of skew buckets */
 	int			skewBucketLen;	/* size of skewBucket array (a power of 2!) */
@@ -343,16 +341,6 @@ typedef struct HashJoinTableData
 	BufFile   **innerBatchFile; /* buffered virtual temp file per batch */
 	BufFile   **outerBatchFile; /* buffered virtual temp file per batch */
 
-	/*
-	 * Info about the datatype-specific hash functions for the datatypes being
-	 * hashed. These are arrays of the same length as the number of hash join
-	 * clauses (hash keys).
-	 */
-	FmgrInfo   *outer_hashfunctions;	/* lookup data for hash functions */
-	FmgrInfo   *inner_hashfunctions;	/* lookup data for hash functions */
-	bool	   *hashStrict;		/* is each hash join operator strict? */
-	Oid		   *collations;
-
 	Size		spaceUsed;		/* memory space currently used by tuples */
 	Size		spaceAllowed;	/* upper limit for space used */
 	Size		spacePeak;		/* peak space used */
diff --git a/src/include/executor/nodeHash.h b/src/include/executor/nodeHash.h
index a95911c2fe..e4eb7bc635 100644
--- a/src/include/executor/nodeHash.h
+++ b/src/include/executor/nodeHash.h
@@ -24,8 +24,7 @@ extern Node *MultiExecHash(HashState *node);
 extern void ExecEndHash(HashState *node);
 extern void ExecReScanHash(HashState *node);
 
-extern HashJoinTable ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
-										 bool keepNulls);
+extern HashJoinTable ExecHashTableCreate(HashState *state);
 extern void ExecParallelHashTableAlloc(HashJoinTable hashtable,
 									   int batchno);
 extern void ExecHashTableDestroy(HashJoinTable hashtable);
@@ -43,12 +42,6 @@ extern void ExecParallelHashTableInsert(HashJoinTable hashtable,
 extern void ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable,
 													TupleTableSlot *slot,
 													uint32 hashvalue);
-extern bool ExecHashGetHashValue(HashJoinTable hashtable,
-								 ExprContext *econtext,
-								 List *hashkeys,
-								 bool outer_tuple,
-								 bool keep_nulls,
-								 uint32 *hashvalue);
 extern void ExecHashGetBucketAndBatch(HashJoinTable hashtable,
 									  uint32 hashvalue,
 									  int *bucketno,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 87f1519ec6..af7d8fd1e7 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2184,8 +2184,7 @@ typedef struct MergeJoinState
  *	 HashJoinState information
  *
  *		hashclauses				original form of the hashjoin condition
- *		hj_OuterHashKeys		the outer hash keys in the hashjoin condition
- *		hj_HashOperators		the join operators in the hashjoin condition
+ *		hj_OuterHash			ExprState for hashing outer keys
  *		hj_HashTable			hash table for the hashjoin
  *								(NULL if table not built yet)
  *		hj_CurHashValue			hash value for current outer tuple
@@ -2215,9 +2214,7 @@ typedef struct HashJoinState
 {
 	JoinState	js;				/* its first field is NodeTag */
 	ExprState  *hashclauses;
-	List	   *hj_OuterHashKeys;	/* list of ExprState nodes */
-	List	   *hj_HashOperators;	/* list of operator OIDs */
-	List	   *hj_Collations;
+	ExprState  *hj_OuterHash;
 	HashJoinTable hj_HashTable;
 	uint32		hj_CurHashValue;
 	int			hj_CurBucketNo;
@@ -2770,7 +2767,10 @@ typedef struct HashState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	HashJoinTable hashtable;	/* hash table for the hashjoin */
-	List	   *hashkeys;		/* list of ExprState nodes */
+	ExprState  *hash_expr;		/* ExprState to get hash value */
+
+	FmgrInfo   *skew_hashfunction;	/* lookup data for skew hash function */
+	Oid			skew_collation; /* collation to call skew_hashfunction with */
 
 	/*
 	 * In a parallelized hash join, the leader retains a pointer to the
-- 
2.34.1

#7David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#6)
Re: Speed up Hash Join by teaching ExprState about hashing

On Mon, 19 Aug 2024 at 18:41, David Rowley <dgrowleyml@gmail.com> wrote:

The attached v5 patch includes this change.

I made a few more tweaks to the comments and pushed the result.

Thank you both of you for having a look at this.

David

#8David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#7)
1 attachment(s)
Re: Speed up Hash Join by teaching ExprState about hashing

On Tue, 20 Aug 2024 at 13:40, David Rowley <dgrowleyml@gmail.com> wrote:

I made a few more tweaks to the comments and pushed the result.

While working on the patch to make HashAgg / hashed subplans and
hashed setops to use ExprState hashing, I discovered a bug in that
code when hashing 0 hashkeys (Possible with SELECT FROM t INTERSECT
SELECT FROM t;) where the initial value wasn't being stored properly.
I was storing it in the intermediate hash location expecting that the
step to process the final hash key would fetch that and store the
final result in ExprState.resvalue. That's clearly not going to work
when there are 0 expressions to hash. The correct thing to do is just
store the initial value in the ExprState.resvalue field directly when
there are zero expressions to hash.

The attached patch fixes the same mistake in ExecBuildHash32Expr().
This bug is only hypothetical as currently, nothing calls that
function passing a non-zero initial value.

I plan on pushing this shortly.

David

Attachments:

0001-Fix-hypothetical-bug-in-ExprState-building-for-hashi.patchtext/plain; charset=US-ASCII; name=0001-Fix-hypothetical-bug-in-ExprState-building-for-hashi.patchDownload
From 3bf35f185849e77f431fe2d351ce538cd1e9ca95 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Tue, 5 Nov 2024 12:29:33 +1300
Subject: [PATCH] Fix hypothetical bug in ExprState building for hashing

adf97c156 gave ExprStates the ability to hash expressions and return a
single hash value.  That commit supports seeding the hash value with an
initial value to have that blended into the final hash value.

Here we fix a hypothetical bug where if there are zero expressions to
hash, the initial value is stored in the wrong location.  The existing
code stored the initial value in an intermediate location expecting that
when the expressions were hashed that those steps would store the final
hash value in the ExprState.resvalue field.  However, that wouldn't happen
when there are zero expressions to hash.  The correct thing to do instead
is to have a special case for zero expressions and when we hit that case,
store the initial value directly in the ExprState.resvalue.  The reason
that this is a hypothetical bug is that no code currently calls
ExecBuildHash32Expr passing a non-zero initial value.
---
 src/backend/executor/execExpr.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 45954d979f..8f7a534005 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -4004,8 +4004,9 @@ ExecBuildHash32Expr(TupleDesc desc, const TupleTableSlotOps *ops,
 	ListCell   *lc2;
 	intptr_t	strict_opcode;
 	intptr_t	opcode;
+	int			num_exprs = list_length(hash_exprs);
 
-	Assert(list_length(hash_exprs) == list_length(collations));
+	Assert(num_exprs == list_length(collations));
 
 	state->parent = parent;
 
@@ -4013,11 +4014,11 @@ ExecBuildHash32Expr(TupleDesc desc, const TupleTableSlotOps *ops,
 	ExecCreateExprSetupSteps(state, (Node *) hash_exprs);
 
 	/*
-	 * When hashing more than 1 expression or if we have an init value, we
-	 * need somewhere to store the intermediate hash value so that it's
-	 * available to be combined with the result of subsequent hashing.
+	 * Make a place to store intermediate hash values between subsequent
+	 * hashing of individual expressions.  We only need this if there is more
+	 * than one expression to hash or an initial value plus one expression.
 	 */
-	if (list_length(hash_exprs) > 1 || init_value != 0)
+	if ((int64) num_exprs + (init_value != 0) > 1)
 		iresult = palloc(sizeof(NullableDatum));
 
 	if (init_value == 0)
@@ -4032,11 +4033,15 @@ ExecBuildHash32Expr(TupleDesc desc, const TupleTableSlotOps *ops,
 	}
 	else
 	{
-		/* Set up operation to set the initial value. */
+		/*
+		 * Set up operation to set the initial value.  Normally we store this
+		 * in the intermediate hash value location, but if there are no exprs
+		 * to hash, store it in the ExprState's result field.
+		 */
 		scratch.opcode = EEOP_HASHDATUM_SET_INITVAL;
 		scratch.d.hashdatum_initvalue.init_value = UInt32GetDatum(init_value);
-		scratch.resvalue = &iresult->value;
-		scratch.resnull = &iresult->isnull;
+		scratch.resvalue = num_exprs > 0 ? &iresult->value : &state->resvalue;
+		scratch.resnull = num_exprs > 0 ? &iresult->isnull : &state->resnull;
 
 		ExprEvalPushStep(state, &scratch);
 
@@ -4074,7 +4079,7 @@ ExecBuildHash32Expr(TupleDesc desc, const TupleTableSlotOps *ops,
 						&fcinfo->args[0].value,
 						&fcinfo->args[0].isnull);
 
-		if (i == list_length(hash_exprs) - 1)
+		if (i == num_exprs - 1)
 		{
 			/* the result for hashing the final expr is stored in the state */
 			scratch.resvalue = &state->resvalue;
-- 
2.34.1