Enable using IS NOT DISTINCT FROM in hash and merge joins
Hello,
We are using PostgreSQL to execute some SQL scripts "auto translated" from HIVE QL, where the join operator "<=>" is heavily used. The semantic same operator in PostgreSQL is "IS NOT DISTINCT FROM".
However, we found when "IS NOT DISTINCT FROM" is used in joins, only nested loop plan can be generated, which is confirmed here /messages/by-id/13950.1511879733@sss.pgh.pa.us and here https://postgrespro.com/list/thread-id/2059856 .
In another discussion someone suggests using coalesce(...) to replace NULLs to some special value, but in similar situation as in that thread, we have no reliable way to conclude a special value for any expression.
So I hacked the PG10 code to support using "IS NOT DISTINCT FROM" in hash and merge joins (not touching the indexes). It works in our environment, but I want to know if my approach is making sense, or is going to make damage.
There are 6 kinds of changes, and to be honest, none of them I am confident is doing in correct way...so please advise:
- I do this by first reversing the meaning of DistinctExpr, from "IS DISTINCT FROM" to "IS NOT DISTINCT FROM", which will be simpler to process in joins, because "IS NOT DISTINCT FROM" is closer to "=". (backend/parser/parse_expr.c, backend/utils/adt/ruleutils.c)
- The execution logic of DistinctExpr internally already reverts the result, because above change cancels it out, I revert it back. (backend/executor/execExprInterp.c, backend/optimizer/path/clausesel.c)
- In hash joins, I need to tell the executor that "NULL matches NULL" when the operator is "IS NOT DISTINCT FROM". I cannot figure out the best way for passing such information down, so I just ORed 0x8000000 to the operator Oid List. As no code in other parts is doing so, please advise a better approach, should I add a Bitmapset to pass the flags? Or should I define a new Node type to include both Oid and a bool flag? (backend/executor/nodeHashjoin.c, backend/executor/nodeHash.c)
- To support merge join, I added a nulleqnull bool flag in SortSupportData to bypass the "stop merging earlier when NULLs is reached" logic when the join operator is DistinctExpr. I think there is a padding gap after "bool abbreviate;", so I add the new flag after that, just want to keep binary compatibility in case something depends on it... (backend/executor/nodeMergejoin.c, include/utils/sortsupport.h)
- In create_join_clause, reconsider_outer_join_clause, and reconsider_full_join_clause functions, the derived expression generated by call to build_implied_join_equality outputs OpExpr for DistictExpr, because they are same in definition, I just patch the resulting node back to DistinctExpr if input is DistinctExpr. (backend/optimizer/path/equivclass.c)
- All other changes are for necessary code paths only allow OpExpr, I added logic to allow DistinctExpr too.
The patch in attachment is based on commit 821200405cc3f25fda28c5f58d17d640e25559b8.
Thanks!
Gao, Chi
Beijing Microfun Co. Ltd.
Attachments:
enable_is_not_distinct_from_in_hash_and_merge_joins.patchapplication/octet-stream; name=enable_is_not_distinct_from_in_hash_and_merge_joins.patchDownload
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index f2a52f62..da2282f 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -1155,13 +1155,13 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
if (fcinfo->argnull[0] && fcinfo->argnull[1])
{
/* Both NULL? Then is not distinct... */
- *op->resvalue = BoolGetDatum(false);
+ *op->resvalue = BoolGetDatum(true);
*op->resnull = false;
}
else if (fcinfo->argnull[0] || fcinfo->argnull[1])
{
/* Only one is NULL? Then is distinct... */
- *op->resvalue = BoolGetDatum(true);
+ *op->resvalue = BoolGetDatum(false);
*op->resnull = false;
}
else
@@ -1171,8 +1171,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
fcinfo->isnull = false;
eqresult = (op->d.func.fn_addr) (fcinfo);
- /* Must invert result of "="; safe to do even if null */
- *op->resvalue = BoolGetDatum(!DatumGetBool(eqresult));
+ *op->resvalue = eqresult;
*op->resnull = fcinfo->isnull;
}
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 75637d5..3a99581 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -342,13 +342,16 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
Oid hashop = lfirst_oid(ho);
Oid left_hashfn;
Oid right_hashfn;
+ bool isDistinct = (hashop & 0x80000000) != 0;
+
+ hashop &= 0x7FFFFFFF;
if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
elog(ERROR, "could not find hash function for hash operator %u",
hashop);
fmgr_info(left_hashfn, &hashtable->outer_hashfunctions[i]);
fmgr_info(right_hashfn, &hashtable->inner_hashfunctions[i]);
- hashtable->hashStrict[i] = op_strict(hashop);
+ hashtable->hashStrict[i] = op_strict(hashop) && !isDistinct;
i++;
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index ab1632c..e9f9721 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -520,13 +520,14 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hoperators = NIL;
foreach(l, node->hashclauses)
{
- OpExpr *hclause = lfirst_node(OpExpr, l);
+ OpExpr *hclause = (OpExpr *) lfirst(l);
+ Assert(IsA(hclause, OpExpr) || IsA(hclause, DistinctExpr));
lclauses = lappend(lclauses, ExecInitExpr(linitial(hclause->args),
(PlanState *) hjstate));
rclauses = lappend(rclauses, ExecInitExpr(lsecond(hclause->args),
(PlanState *) hjstate));
- hoperators = lappend_oid(hoperators, hclause->opno);
+ hoperators = lappend_oid(hoperators, (IsA(hclause, DistinctExpr) ? 0x80000000 : 0) | hclause->opno);
}
hjstate->hj_OuterHashKeys = lclauses;
hjstate->hj_InnerHashKeys = rclauses;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index eef0382..8381d0a 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -201,7 +201,7 @@ MJExamineQuals(List *mergeclauses,
Oid op_righttype;
Oid sortfunc;
- if (!IsA(qual, OpExpr))
+ if (!IsA(qual, OpExpr) && !IsA(qual, DistinctExpr))
elog(ERROR, "mergejoin clause is not an OpExpr");
/*
@@ -262,6 +262,10 @@ MJExamineQuals(List *mergeclauses,
/* We'll use a shim to call the old-style btree comparator */
PrepareSortSupportComparisonShim(sortfunc, &clause->ssup);
}
+ if (IsA(qual, DistinctExpr))
+ {
+ clause->ssup.nulleqnull = true;
+ }
iClause++;
}
@@ -315,7 +319,7 @@ MJEvalOuterValues(MergeJoinState *mergestate)
clause->ldatum = ExecEvalExpr(clause->lexpr, econtext,
&clause->lisnull);
- if (clause->lisnull)
+ if (clause->lisnull && !clause->ssup.nulleqnull)
{
/* match is impossible; can we end the join early? */
if (i == 0 && !clause->ssup.ssup_nulls_first &&
@@ -362,7 +366,7 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
clause->rdatum = ExecEvalExpr(clause->rexpr, econtext,
&clause->risnull);
- if (clause->risnull)
+ if (clause->risnull && !clause->ssup.nulleqnull)
{
/* match is impossible; can we end the join early? */
if (i == 0 && !clause->ssup.ssup_nulls_first &&
@@ -412,7 +416,7 @@ MJCompare(MergeJoinState *mergestate)
/*
* Special case for NULL-vs-NULL, else use standard comparison.
*/
- if (clause->lisnull && clause->risnull)
+ if (clause->lisnull && clause->risnull && !clause->ssup.nulleqnull)
{
nulleqnull = true; /* NULL "=" NULL */
continue;
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 9d34025..aaddfda 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -748,15 +748,6 @@ clause_selectivity(PlannerInfo *root,
opclause->inputcollid,
varRelid);
}
-
- /*
- * DistinctExpr has the same representation as OpExpr, but the
- * contained operator is "=" not "<>", so we must negate the result.
- * This estimation method doesn't give the right behavior for nulls,
- * but it's better than doing nothing.
- */
- if (IsA(clause, DistinctExpr))
- s1 = 1.0 - s1;
}
else if (IsA(clause, ScalarArrayOpExpr))
{
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index cc7f247..a5a1b39 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -63,6 +63,12 @@ static bool reconsider_outer_join_clause(PlannerInfo *root,
bool outer_on_left);
static bool reconsider_full_join_clause(PlannerInfo *root,
RestrictInfo *rinfo);
+static RestrictInfo * build_implied_join_equality_distinct(Oid opno,
+ EquivalenceClass *ec,
+ Expr *item1,
+ Expr *item2,
+ Relids qualscope,
+ Relids nullable_relids);
/*
@@ -134,7 +140,7 @@ process_equivalence(PlannerInfo *root, RestrictInfo *restrictinfo,
return false;
/* Extract info from given clause */
- Assert(is_opclause(clause));
+ Assert(is_opclause(clause) || IsA(clause, DistinctExpr));
opno = ((OpExpr *) clause)->opno;
collation = ((OpExpr *) clause)->inputcollid;
item1 = (Expr *) get_leftop(clause);
@@ -1420,15 +1426,14 @@ create_join_clause(PlannerInfo *root,
*/
oldcontext = MemoryContextSwitchTo(root->planner_cxt);
- rinfo = build_implied_join_equality(opno,
- ec->ec_collation,
+ rinfo = build_implied_join_equality_distinct(opno,
+ ec,
leftem->em_expr,
rightem->em_expr,
bms_union(leftem->em_relids,
rightem->em_relids),
bms_union(leftem->em_nullable_relids,
- rightem->em_nullable_relids),
- ec->ec_min_security);
+ rightem->em_nullable_relids));
/* Mark the clause as redundant, or not */
rinfo->parent_ec = parent_ec;
@@ -1654,7 +1659,7 @@ reconsider_outer_join_clause(PlannerInfo *root, RestrictInfo *rinfo,
inner_nullable_relids;
ListCell *lc1;
- Assert(is_opclause(rinfo->clause));
+ Assert(is_opclause(rinfo->clause) || IsA(rinfo->clause, DistinctExpr));
opno = ((OpExpr *) rinfo->clause)->opno;
collation = ((OpExpr *) rinfo->clause)->inputcollid;
@@ -1734,13 +1739,12 @@ reconsider_outer_join_clause(PlannerInfo *root, RestrictInfo *rinfo,
cur_em->em_datatype);
if (!OidIsValid(eq_op))
continue; /* can't generate equality */
- newrinfo = build_implied_join_equality(eq_op,
- cur_ec->ec_collation,
+ newrinfo = build_implied_join_equality_distinct(eq_op,
+ cur_ec,
innervar,
cur_em->em_expr,
bms_copy(inner_relids),
- bms_copy(inner_nullable_relids),
- cur_ec->ec_min_security);
+ bms_copy(inner_nullable_relids));
if (process_equivalence(root, newrinfo, true))
match = true;
}
@@ -1784,7 +1788,7 @@ reconsider_full_join_clause(PlannerInfo *root, RestrictInfo *rinfo)
return false;
/* Extract needed info from the clause */
- Assert(is_opclause(rinfo->clause));
+ Assert(is_opclause(rinfo->clause) || IsA(rinfo->clause, DistinctExpr));
opno = ((OpExpr *) rinfo->clause)->opno;
collation = ((OpExpr *) rinfo->clause)->inputcollid;
op_input_types(opno, &left_type, &right_type);
@@ -1877,13 +1881,12 @@ reconsider_full_join_clause(PlannerInfo *root, RestrictInfo *rinfo)
cur_em->em_datatype);
if (OidIsValid(eq_op))
{
- newrinfo = build_implied_join_equality(eq_op,
- cur_ec->ec_collation,
+ newrinfo = build_implied_join_equality_distinct(eq_op,
+ cur_ec,
leftvar,
cur_em->em_expr,
bms_copy(left_relids),
- bms_copy(left_nullable_relids),
- cur_ec->ec_min_security);
+ bms_copy(left_nullable_relids));
if (process_equivalence(root, newrinfo, true))
matchleft = true;
}
@@ -1892,13 +1895,12 @@ reconsider_full_join_clause(PlannerInfo *root, RestrictInfo *rinfo)
cur_em->em_datatype);
if (OidIsValid(eq_op))
{
- newrinfo = build_implied_join_equality(eq_op,
- cur_ec->ec_collation,
+ newrinfo = build_implied_join_equality_distinct(eq_op,
+ cur_ec,
rightvar,
cur_em->em_expr,
bms_copy(right_relids),
- bms_copy(right_nullable_relids),
- cur_ec->ec_min_security);
+ bms_copy(right_nullable_relids));
if (process_equivalence(root, newrinfo, true))
matchright = true;
}
@@ -2472,3 +2474,33 @@ is_redundant_derived_clause(RestrictInfo *rinfo, List *clauselist)
return false;
}
+
+
+static RestrictInfo *
+build_implied_join_equality_distinct(Oid opno,
+ EquivalenceClass *ec,
+ Expr *item1,
+ Expr *item2,
+ Relids qualscope,
+ Relids nullable_relids)
+{
+ ListCell *lc;
+ RestrictInfo *rinfo;
+ RestrictInfo *newrinfo = build_implied_join_equality(opno,
+ ec->ec_collation,
+ item1,
+ item2,
+ qualscope,
+ nullable_relids,
+ ec->ec_min_security);
+
+
+ foreach(lc, ec->ec_sources)
+ {
+ rinfo = (RestrictInfo *)lfirst(lc);
+ if (IsA(rinfo->clause, DistinctExpr))
+ NodeSetTag(newrinfo->clause, T_DistinctExpr);
+ }
+
+ return newrinfo;
+}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 30bfde4..81ee469 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -693,7 +693,7 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* caller's mergejoinability test should have selected only
* OpExprs.
*/
- op = castNode(OpExpr, rinfo->clause)->opno;
+ op = IsA(rinfo->clause, DistinctExpr) ? castNode(DistinctExpr, rinfo->clause)->opno : castNode(OpExpr, rinfo->clause)->opno;
/* caller identified the inner side for us */
if (rinfo->outer_is_left)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 73cc37f..475e817 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4141,7 +4141,7 @@ create_hashjoin_plan(PlannerInfo *root,
OpExpr *clause = (OpExpr *) linitial(hashclauses);
Node *node;
- Assert(is_opclause(clause));
+ Assert(is_opclause(clause) || IsA(clause, DistinctExpr));
node = (Node *) linitial(clause->args);
if (IsA(node, RelabelType))
node = (Node *) ((RelabelType *) node)->arg;
@@ -4697,7 +4697,7 @@ get_switched_clauses(List *clauses, Relids outerrelids)
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
OpExpr *clause = (OpExpr *) restrictinfo->clause;
- Assert(is_opclause(clause));
+ Assert(is_opclause(clause) || IsA(clause, DistinctExpr));
if (bms_is_subset(restrictinfo->right_relids, outerrelids))
{
/*
@@ -4705,7 +4705,7 @@ get_switched_clauses(List *clauses, Relids outerrelids)
* clause without changing the original list. Could use
* copyObject, but a complete deep copy is overkill.
*/
- OpExpr *temp = makeNode(OpExpr);
+ OpExpr *temp = IsA(clause, DistinctExpr) ? makeNode(DistinctExpr) : makeNode(OpExpr);
temp->opno = clause->opno;
temp->opfuncid = InvalidOid;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 4d0cc2a..e3e255a 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2591,7 +2591,7 @@ check_mergejoinable(RestrictInfo *restrictinfo)
if (restrictinfo->pseudoconstant)
return;
- if (!is_opclause(clause))
+ if (!is_opclause(clause) && !IsA(clause, DistinctExpr))
return;
if (list_length(((OpExpr *) clause)->args) != 2)
return;
@@ -2628,7 +2628,7 @@ check_hashjoinable(RestrictInfo *restrictinfo)
if (restrictinfo->pseudoconstant)
return;
- if (!is_opclause(clause))
+ if (!is_opclause(clause) && !IsA(clause, DistinctExpr))
return;
if (list_length(((OpExpr *) clause)->args) != 2)
return;
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index fa53b7a..0b75cec 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2260,7 +2260,7 @@ CommuteOpExpr(OpExpr *clause)
Node *temp;
/* Sanity checks: caller is at fault if these fail */
- if (!is_opclause(clause) ||
+ if ((!is_opclause(clause) && !IsA(clause, DistinctExpr)) ||
list_length(clause->args) != 2)
elog(ERROR, "cannot commute non-binary-operator clause");
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index d5652b0..8f3f668 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -133,7 +133,7 @@ make_restrictinfo_internal(Expr *clause,
* If it's a binary opclause, set up left/right relids info. In any case
* set up the total clause relids info.
*/
- if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+ if ((is_opclause(clause) || IsA(clause, DistinctExpr)) && list_length(((OpExpr *) clause)->args) == 2)
{
restrictinfo->left_relids = pull_varnos(get_leftop(clause));
restrictinfo->right_relids = pull_varnos(get_rightop(clause));
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 6d8cb07..21ee4ef 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -1035,7 +1035,7 @@ transformAExprDistinct(ParseState *pstate, A_Expr *a)
* If it's NOT DISTINCT, we first build a DistinctExpr and then stick a
* NOT on top.
*/
- if (a->kind == AEXPR_NOT_DISTINCT)
+ if (a->kind == AEXPR_DISTINCT)
result = (Node *) makeBoolExpr(NOT_EXPR,
list_make1(result),
a->location);
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 8e1bee5..b5311ca 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -7760,7 +7760,7 @@ get_rule_expr(Node *node, deparse_context *context,
if (!PRETTY_PAREN(context))
appendStringInfoChar(buf, '(');
get_rule_expr_paren(arg1, context, true, node);
- appendStringInfoString(buf, " IS DISTINCT FROM ");
+ appendStringInfoString(buf, " IS NOT DISTINCT FROM ");
get_rule_expr_paren(arg2, context, true, node);
if (!PRETTY_PAREN(context))
appendStringInfoChar(buf, ')');
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 3d633de..9d0f2df 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2878,7 +2878,7 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
*leftend = *rightend = 1.0;
/* Deconstruct the merge clause */
- if (!is_opclause(clause))
+ if (!is_opclause(clause) && !IsA(clause, DistinctExpr))
return; /* shouldn't happen */
opno = ((OpExpr *) clause)->opno;
left = get_leftop((Expr *) clause);
diff --git a/src/include/utils/sortsupport.h b/src/include/utils/sortsupport.h
index 6e8444b..57f5fae 100644
--- a/src/include/utils/sortsupport.h
+++ b/src/include/utils/sortsupport.h
@@ -155,6 +155,8 @@ typedef struct SortSupportData
*/
bool abbreviate;
+ bool nulleqnull;
+
/*
* Converter to abbreviated format, from original representation. Core
* code uses this callback to convert from a pass-by-reference "original"
Hi,
I heard from my colleagues that we have an extension that does a similar
thing. I'm attaching the source. It creates operator "==" for some data
types, that works like "IS NOT DISTINCT FROM". You can then perform hash
joins on this operator. Apparently the hash join machinery supports
non-strict operators, but I'm not sure about the hash indexes.
I think the question we have to answer is what prevents us from having
hash and merge joins on non-strict operators? Hash joins seem to work
already, so we can just create a custom operator and use it. Merge join
executor can be adapted to work as well, but the planner would require
more complex changes. Just adding a check for DistinctExpr to
check_mergejoinable probably breaks equivalence classes. The problem
with merge join planning is that it has the notion of "mergejoinable
operator", which is a strict btree equality operator, and it uses such
operators both to perform merge joins and to conclude that some two
variables must be equal (that is, create equivalence classes). If we are
going to perform merge joins on some other kinds of operators, we have
to disentangle these two uses. I had to do this to support merge joins
on inequality clauses, you can take a look at this thread if you wish:
/messages/by-id/b31e1a2d-5ed2-cbca-649e-136f1a7c4c31@postgrespro.ru
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company