Making clausesel.c Smarter

Started by David Rowleyalmost 9 years ago29 messages
#1David Rowley
david.rowley@2ndquadrant.com
1 attachment(s)

I've stumbled over an interesting query out in the wild where the
query was testing a nullable timestamptz column was earlier than some
point in time, and also checking the column IS NOT NULL.

The use pattern here was that records which required processing had
this timestamp set to something other than NULL, a worker would come
along and process those, and UPDATE the record to NULL to mark the
fact that it was now processed. So what we are left with was a table
with a small number of rows with a value in this timestamp column, and
an ever-growing number of rows with a NULL value.

A highly simplified version of the query was checking for records that
required processing before some date, say:

SELECT * FROM a WHERE ts < '2017-02-24' AND ts IS NOT NULL;

Of course, the ts IS NOT NULL is not required here, but I can
understand how someone might make the mistake of adding this. The
simple solution to the problem was to have that null test removed, but
that seemingly innocent little mistake caused some pain due to very
slow running queries which held on to a nested loop plan 33 times
longer than it should have been doing. Basically what was happening
here is that clauselist_selectivity() was applying the selectivity
with the ~0.97 null_fact from pg_statistic over the top of the already
applied estimate on the number of rows below the constant timestamp.

Since the diagnosis of this problem was not instant, and some amount
of pain was suffered before the solution was found, I wondered if it
might be worth smartening up the planner a bit in this area.

We're already pretty good at handling conditions like: SELECT * FROM a
WHERE x < 10 and x < 1; where we'll effectively ignore the x < 10
estimate since x < 1 is more restrictive, so if we just build on that
ability a bit we could get enough code to cover the above case.

I've attached a draft patch which improves the situation here.

Given the test case:

create table ts (ts timestamptz);
insert into ts select case when x%1000=0 then '2017-01-01'::date +
(x::text || ' sec')::interval else null end from
generate_series(1,1000000) x ( x );
analyze ts;

explain analyze select count(*) from ts where ts <=
'2017-02-01'::timestamptz and ts is not null;

With the patch we get:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Aggregate (cost=15938.83..15938.84 rows=1 width=8) (actual
time=101.003..101.003 rows=1 loops=1)
-> Seq Scan on ts (cost=0.00..15937.00 rows=733 width=0) (actual
time=0.184..100.868 rows=1000 loops=1)
Filter: ((ts IS NOT NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
Rows Removed by Filter: 999000
Planning time: 0.153 ms
Execution time: 101.063 ms

Whereas master gives us:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Aggregate (cost=15937.00..15937.01 rows=1 width=8) (actual
time=119.256..119.256 rows=1 loops=1)
-> Seq Scan on ts (cost=0.00..15937.00 rows=1 width=0) (actual
time=0.172..119.062 rows=1000 loops=1)
Filter: ((ts IS NOT NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
Rows Removed by Filter: 999000
Planning time: 0.851 ms
Execution time: 119.401 ms

A side effect of this is that we're now able to better detect
impossible cases such as:

postgres=# explain analyze select count(*) from ts where ts <=
'2017-02-01'::timestamptz and ts is null;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Aggregate (cost=15937.00..15937.01 rows=1 width=8) (actual
time=135.012..135.012 rows=1 loops=1)
-> Seq Scan on ts (cost=0.00..15937.00 rows=1 width=0) (actual
time=135.007..135.007 rows=0 loops=1)
Filter: ((ts IS NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
Rows Removed by Filter: 1000000
Planning time: 0.067 ms
Execution time: 135.050 ms
(6 rows)

Master is not able to see the impossibility of this case:

postgres=# explain analyze select count(*) from ts where ts <=
'2017-02-01'::timestamptz and ts is null;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Aggregate (cost=15938.83..15938.84 rows=1 width=8) (actual
time=131.681..131.681 rows=1 loops=1)
-> Seq Scan on ts (cost=0.00..15937.00 rows=733 width=0) (actual
time=131.676..131.676 rows=0 loops=1)
Filter: ((ts IS NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
Rows Removed by Filter: 1000000
Planning time: 0.090 ms
Execution time: 131.719 ms
(6 rows)

Now one thing I was unsure of in the patch was this "Other clauses"
concept that I've come up with. In the patch we have:

default:
addCachedSelectivityOtherClause(&cslist, var, s2);
break;

I was unsure if this should only apply to F_EQSEL and F_NEQSEL. My
imagination here won't stretch far enough to imagine a OpExpr which
passes with a NULL operand. If it did my logic is broken, and if
that's the case then I think limiting "Others" to mean F_EQSEL and
F_NEQSEL would be the workaround.

Before I do any more work on this, I'd like to know, is this something
that we might want to fix?

It would be good to improve the situation here in the back branches
too, but I'm thinking that the attached might be a little invasive for
that?

Thoughts?

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

smarter_clausesel.patchapplication/octet-stream; name=smarter_clausesel.patchDownload
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index af2934a..2766cc6 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -23,23 +23,38 @@
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
+#define CACHESEL_LOBOUND		0x0001	/* has var > something selectivity */
+#define CACHESEL_HIBOUND		0x0002	/* has var < something selectivity */
+#define CACHESEL_NULLTEST		0x0004	/* has var IS NULL selectivity */
+#define CACHESEL_NOTNULLTEST	0x0008	/* has var IS NOT NULL selectivity */
+#define CACHESEL_OTHER			0x0010	/* has another OpExpr selectivity */
 
 /*
- * Data structure for accumulating info about possible range-query
- * clause pairs in clauselist_selectivity.
+ * Data structure for caching selectivity for clauselist_selectivity.
  */
-typedef struct RangeQueryClause
+typedef struct CachedSelectivityClause
 {
-	struct RangeQueryClause *next;		/* next in linked list */
+	struct CachedSelectivityClause *next;		/* next in linked list */
 	Node	   *var;			/* The common variable of the clauses */
-	bool		have_lobound;	/* found a low-bound clause yet? */
-	bool		have_hibound;	/* found a high-bound clause yet? */
+	int			selmask;		/* Bitmask of which sel types are stored */
 	Selectivity lobound;		/* Selectivity of a var > something clause */
 	Selectivity hibound;		/* Selectivity of a var < something clause */
-} RangeQueryClause;
-
-static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
+	Selectivity nullsel;		/* Selectivity of a IS NULL test */
+	Selectivity notnullsel;		/* Selectivity of a IS NOT NULL test */
+	Selectivity othersel;		/* Selectivity of any other clause */
+} CachedSelectivityClause;
+
+
+static CachedSelectivityClause *findCachedSelectivityVar(
+			   CachedSelectivityClause **cslist, Node *var);
+static void addCachedSelectivityNullTest(CachedSelectivityClause **cslist,
+			   Node *var, Selectivity s2);
+static void addCachedSelectivityNotNullTest(CachedSelectivityClause **cslist,
+			   Node *var, Selectivity s2);
+static void addCachedSelectivityRangeVar(CachedSelectivityClause **cslist, Node *var,
 			   bool varonleft, bool isLTsel, Selectivity s2);
+static void addCachedSelectivityOtherClause(CachedSelectivityClause **cslist,
+			   Node *var, Selectivity s2);
 
 
 /****************************************************************************
@@ -60,27 +75,38 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
- * Currently, the only extra smarts we have is to recognize "range queries",
- * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
- * query components if they are restriction opclauses whose operators have
+ *
+ * Currently we only handle OpExprs with 2 operands, and IS [NOT] NULL tests.
+ * We have extra smarts to recognize "range queries", such as
+ * "x > 34 AND x < 42".  These clauses are recognized as possible range query
+ * components if they are restriction opclauses whose operators have
  * scalarltsel() or scalargtsel() as their restriction selectivity estimator.
- * We pair up clauses of this form that refer to the same variable.  An
- * unpairable clause of this kind is simply multiplied into the selectivity
- * product in the normal way.  But when we find a pair, we know that the
- * selectivities represent the relative positions of the low and high bounds
- * within the column's range, so instead of figuring the selectivity as
- * hisel * losel, we can figure it as hisel + losel - 1.  (To visualize this,
- * see that hisel is the fraction of the range below the high bound, while
- * losel is the fraction above the low bound; so hisel can be interpreted
- * directly as a 0..1 value but we need to convert losel to 1-losel before
- * interpreting it as a value.  Then the available range is 1-losel to hisel.
- * However, this calculation double-excludes nulls, so really we need
- * hisel + losel + null_frac - 1.)
+ * IS [NOT] NULL tests are handled too, it's a common enough, and perhaps
+ * innocent enough looking mistake to include a IS NOT NULL test along with
+ * another condition for the same var in a query, so its useful in this case
+ * to treat the "IS NOT NULL" as a no-op. However, since we only collect
+ * OpExprs and NullTests, we're only going to detect this NullTest issue in
+ * the cases where the other found qual is an OpExpr.
+ *
+ * We collect up clauses so that we're able to conditionally apply each
+ * selectivity based on the other clauses which have been seen for the same
+ * Var.  Any clause which is unpaired is simply multiplied into the
+ * selectivity product in the normal way.  But when we find a pair, we can
+ * conditionally apply the selectivities.  This also allows us to detect weird
+ * corner cases such as "x = 1 AND x IS NULL".  With range pairs, we know that
+ * the selectivities represent the relative positions of the low and high
+ * bounds within the column's range, so instead of figuring the selectivity
+ * as hisel * losel, we can figure it as hisel + losel - 1.  (To visualize
+ * this, see that hisel is the fraction of the range below the high bound,
+ * while losel is the fraction above the low bound; so hisel can be
+ * interpreted directly as a 0..1 value but we need to convert losel to
+ * 1-losel before interpreting it as a value.  Then the available range is
+ * 1-losel to hisel. However, this calculation double-excludes nulls, so
+ * really we need hisel + losel + null_frac - 1.)
  *
- * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
- * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
- * yields an impossible (negative) result.
+ * For range pairs, if either selectivity is exactly DEFAULT_INEQ_SEL, we
+ * forget this equation and instead use DEFAULT_RANGE_INEQ_SEL.  The same
+ * applies if the equation yields an impossible (negative) result.
  *
  * A free side-effect is that we can recognize redundant inequalities such
  * as "x < 4 AND x < 5"; only the tighter constraint will be counted.
@@ -96,7 +122,7 @@ clauselist_selectivity(PlannerInfo *root,
 					   SpecialJoinInfo *sjinfo)
 {
 	Selectivity s1 = 1.0;
-	RangeQueryClause *rqlist = NULL;
+	CachedSelectivityClause *cslist = NULL;
 	ListCell   *l;
 
 	/*
@@ -140,6 +166,19 @@ clauselist_selectivity(PlannerInfo *root,
 		else
 			rinfo = NULL;
 
+		if (IsA(clause, NullTest))
+		{
+			NullTestType nulltesttype = ((NullTest *) clause)->nulltesttype;
+
+			if (nulltesttype == IS_NULL)
+				addCachedSelectivityNullTest(&cslist,
+									(Node *) ((NullTest *) clause)->arg, s2);
+			else if (nulltesttype == IS_NOT_NULL)
+				addCachedSelectivityNotNullTest(&cslist,
+									(Node *) ((NullTest *) clause)->arg, s2);
+
+			continue;		/* drop to loop bottom */
+		}
 		/*
 		 * See if it looks like a restriction clause with a pseudoconstant on
 		 * one side.  (Anything more complicated than that might not behave in
@@ -151,26 +190,34 @@ clauselist_selectivity(PlannerInfo *root,
 			OpExpr	   *expr = (OpExpr *) clause;
 			bool		varonleft = true;
 			bool		ok;
+			Node	   *leftexpr = (Node *) linitial(expr->args);
+			Node	   *rightexpr = (Node *) lsecond(expr->args);
 
 			if (rinfo)
 			{
 				ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
-					(is_pseudo_constant_clause_relids(lsecond(expr->args),
+					(is_pseudo_constant_clause_relids(rightexpr,
 													  rinfo->right_relids) ||
 					 (varonleft = false,
-					  is_pseudo_constant_clause_relids(linitial(expr->args),
+					  is_pseudo_constant_clause_relids(leftexpr,
 													   rinfo->left_relids)));
 			}
 			else
 			{
 				ok = (NumRelids(clause) == 1) &&
-					(is_pseudo_constant_clause(lsecond(expr->args)) ||
+					(is_pseudo_constant_clause(rightexpr) ||
 					 (varonleft = false,
-					  is_pseudo_constant_clause(linitial(expr->args))));
+					  is_pseudo_constant_clause(leftexpr)));
 			}
 
 			if (ok)
 			{
+				Node *var;
+				if (varonleft)
+					var = leftexpr;
+				else
+					var = rightexpr;
+
 				/*
 				 * If it's not a "<" or ">" operator, just merge the
 				 * selectivity in generically.  But if it's the right oprrest,
@@ -179,16 +226,15 @@ clauselist_selectivity(PlannerInfo *root,
 				switch (get_oprrest(expr->opno))
 				{
 					case F_SCALARLTSEL:
-						addRangeClause(&rqlist, clause,
+						addCachedSelectivityRangeVar(&cslist, var,
 									   varonleft, true, s2);
 						break;
 					case F_SCALARGTSEL:
-						addRangeClause(&rqlist, clause,
+						addCachedSelectivityRangeVar(&cslist, var,
 									   varonleft, false, s2);
 						break;
 					default:
-						/* Just merge the selectivity in generically */
-						s1 = s1 * s2;
+						addCachedSelectivityOtherClause(&cslist, var, s2);
 						break;
 				}
 				continue;		/* drop to loop bottom */
@@ -200,176 +246,252 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
-	 * Now scan the rangequery pair list.
+	 * Now scan the cached selectivity list
 	 */
-	while (rqlist != NULL)
+	while (cslist != NULL)
 	{
-		RangeQueryClause *rqnext;
+		CachedSelectivityClause *csnext;
 
-		if (rqlist->have_lobound && rqlist->have_hibound)
+		if ((cslist->selmask & CACHESEL_NULLTEST))
 		{
-			/* Successfully matched a pair of range clauses */
-			Selectivity s2;
-
 			/*
-			 * Exact equality to the default value probably means the
-			 * selectivity function punted.  This is not airtight but should
-			 * be good enough.
+			 * if null test is not the only flag then there can be no matching
+			 * rows at all.
 			 */
-			if (rqlist->hibound == DEFAULT_INEQ_SEL ||
-				rqlist->lobound == DEFAULT_INEQ_SEL)
-			{
-				s2 = DEFAULT_RANGE_INEQ_SEL;
-			}
+			if (cslist->selmask != CACHESEL_NULLTEST)
+				s1 = 0;
 			else
-			{
-				s2 = rqlist->hibound + rqlist->lobound - 1.0;
+				s1 *= cslist->nullsel;
+		}
 
-				/* Adjust for double-exclusion of NULLs */
-				s2 += nulltestsel(root, IS_NULL, rqlist->var,
-								  varRelid, jointype, sjinfo);
+		/* apply the IS NOT NULL only if it's the only selectivity cached */
+		else if ((cslist->selmask & CACHESEL_NOTNULLTEST) == cslist->selmask)
+		{
+			s1 *= cslist->notnullsel;
+		}
+
+		else
+		{
+			if ((cslist->selmask & (CACHESEL_LOBOUND|CACHESEL_HIBOUND)) ==
+				(CACHESEL_LOBOUND|CACHESEL_HIBOUND))
+			{
+				/* Successfully matched a pair of range clauses */
+				Selectivity s2;
 
 				/*
-				 * A zero or slightly negative s2 should be converted into a
-				 * small positive value; we probably are dealing with a very
-				 * tight range and got a bogus result due to roundoff errors.
-				 * However, if s2 is very negative, then we probably have
-				 * default selectivity estimates on one or both sides of the
-				 * range that we failed to recognize above for some reason.
+				 * Exact equality to the default value probably means the
+				 * selectivity function punted.  This is not airtight but
+				 * should be good enough.
 				 */
-				if (s2 <= 0.0)
+				if (cslist->hibound == DEFAULT_INEQ_SEL ||
+					cslist->lobound == DEFAULT_INEQ_SEL)
 				{
-					if (s2 < -0.01)
-					{
-						/*
-						 * No data available --- use a default estimate that
-						 * is small, but not real small.
-						 */
-						s2 = DEFAULT_RANGE_INEQ_SEL;
-					}
-					else
+					s2 = DEFAULT_RANGE_INEQ_SEL;
+				}
+				else
+				{
+					s2 = cslist->hibound + cslist->lobound - 1.0;
+
+					/* Adjust for double-exclusion of NULLs */
+					s2 += nulltestsel(root, IS_NULL, cslist->var,
+									  varRelid, jointype, sjinfo);
+
+					/*
+					 * A zero or slightly negative s2 should be converted into
+					 * a small positive value; we probably are dealing with a
+					 * very tight range and got a bogus result due to roundoff
+					 * errors. However, if s2 is very negative, then we
+					 * probably have default selectivity estimates on one or
+					 * both sides of the range that we failed to recognize
+					 * above for some reason.
+					 */
+					if (s2 <= 0.0)
 					{
-						/*
-						 * It's just roundoff error; use a small positive
-						 * value
-						 */
-						s2 = 1.0e-10;
+						if (s2 < -0.01)
+						{
+							/*
+							 * No data available --- use a default estimate
+							 * that is small, but not real small.
+							 */
+							s2 = DEFAULT_RANGE_INEQ_SEL;
+						}
+						else
+						{
+							/*
+							 * It's just roundoff error; use a small positive
+							 * value
+							 */
+							s2 = 1.0e-10;
+						}
 					}
 				}
+				/* Merge in the selectivity of the pair of clauses */
+				s1 *= s2;
 			}
-			/* Merge in the selectivity of the pair of clauses */
-			s1 *= s2;
-		}
-		else
-		{
-			/* Only found one of a pair, merge it in generically */
-			if (rqlist->have_lobound)
-				s1 *= rqlist->lobound;
 			else
-				s1 *= rqlist->hibound;
+			{
+				/* Only found one of a range pair, merge it in generically */
+				if ((cslist->selmask & CACHESEL_LOBOUND))
+					s1 *= cslist->lobound;
+				else if ((cslist->selmask & CACHESEL_HIBOUND))
+					s1 *= cslist->hibound;
+			}
+
+			/* apply the selectivity for any other seen qual */
+			if ((cslist->selmask & CACHESEL_OTHER))
+				s1 *= cslist->othersel;
 		}
+
 		/* release storage and advance */
-		rqnext = rqlist->next;
-		pfree(rqlist);
-		rqlist = rqnext;
+		csnext = cslist->next;
+		pfree(cslist);
+		cslist = csnext;
 	}
 
 	return s1;
 }
 
 /*
- * addRangeClause --- add a new range clause for clauselist_selectivity
+ * findCachedSelectivityVar
+ *	Find existing seletivity var, or add this var to the list.
+ */
+static CachedSelectivityClause *
+findCachedSelectivityVar(CachedSelectivityClause **cslist, Node *var)
+{
+	CachedSelectivityClause *cselem;
+
+	for (cselem = *cslist; cselem; cselem = cselem->next)
+	{
+		/*
+		 * We use full equal() here because the "var" might be a function of
+		 * one or more attributes of the same relation...
+		 */
+		if (equal(var, cselem->var))
+			return cselem;
+	}
+
+	/* not found -- add it */
+	cselem = (CachedSelectivityClause *) palloc(sizeof(CachedSelectivityClause));
+	cselem->var = var;
+	cselem->selmask = 0;
+
+	cselem->next = *cslist;
+	*cslist = cselem;
+	return cselem;
+}
+
+
+/*
+ * addCachedSelectivityNullTest
+ *		Cache selectivity for an IS NULL test.
+ */
+static void
+addCachedSelectivityNullTest(CachedSelectivityClause **cslist, Node *var,
+			   Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, var);
+
+	/* We can simply overwrite any previously cached selectivity here */
+	cselem->nullsel = s2;
+	cselem->selmask |= CACHESEL_NULLTEST;
+}
+
+/*
+ * addCachedSelectivityNotNullTest
+ *		Cache selectivity for an IS NOT NULL test.
+ */
+static void
+addCachedSelectivityNotNullTest(CachedSelectivityClause **cslist, Node *var,
+			   Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, var);
+
+	/* We can simply overwrite any previously cached selectivity here */
+	cselem->notnullsel = s2;
+	cselem->selmask |= CACHESEL_NOTNULLTEST;
+}
+
+/*
+ * addCachedSelectivityRangeVar
+ *		add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
  */
 static void
-addRangeClause(RangeQueryClause **rqlist, Node *clause,
+addCachedSelectivityRangeVar(CachedSelectivityClause **cslist, Node *var,
 			   bool varonleft, bool isLTsel, Selectivity s2)
 {
-	RangeQueryClause *rqelem;
-	Node	   *var;
+	CachedSelectivityClause *cselem;
 	bool		is_lobound;
 
-	if (varonleft)
+	is_lobound = (varonleft != isLTsel);
+
+	cselem = findCachedSelectivityVar(cslist, var);
+
+	if (is_lobound)
 	{
-		var = get_leftop((Expr *) clause);
-		is_lobound = !isLTsel;	/* x < something is high bound */
+		if (!(cselem->selmask & CACHESEL_LOBOUND))
+		{
+			cselem->selmask |= CACHESEL_LOBOUND;
+			cselem->lobound = s2;
+		}
+		else
+		{
+			/*------
+			 * We have found two similar clauses, such as
+			 * x < y AND x < z.
+			 * Keep only the more restrictive one.
+			 *------
+			 */
+			if (cselem->lobound > s2)
+				cselem->lobound = s2;
+		}
 	}
 	else
 	{
-		var = get_rightop((Expr *) clause);
-		is_lobound = isLTsel;	/* something < x is low bound */
-	}
-
-	for (rqelem = *rqlist; rqelem; rqelem = rqelem->next)
-	{
-		/*
-		 * We use full equal() here because the "var" might be a function of
-		 * one or more attributes of the same relation...
-		 */
-		if (!equal(var, rqelem->var))
-			continue;
-		/* Found the right group to put this clause in */
-		if (is_lobound)
+		if (!(cselem->selmask & CACHESEL_HIBOUND))
 		{
-			if (!rqelem->have_lobound)
-			{
-				rqelem->have_lobound = true;
-				rqelem->lobound = s2;
-			}
-			else
-			{
-
-				/*------
-				 * We have found two similar clauses, such as
-				 * x < y AND x < z.
-				 * Keep only the more restrictive one.
-				 *------
-				 */
-				if (rqelem->lobound > s2)
-					rqelem->lobound = s2;
-			}
+			cselem->selmask |= CACHESEL_HIBOUND;
+			cselem->hibound = s2;
 		}
 		else
 		{
-			if (!rqelem->have_hibound)
-			{
-				rqelem->have_hibound = true;
-				rqelem->hibound = s2;
-			}
-			else
-			{
 
-				/*------
-				 * We have found two similar clauses, such as
-				 * x > y AND x > z.
-				 * Keep only the more restrictive one.
-				 *------
-				 */
-				if (rqelem->hibound > s2)
-					rqelem->hibound = s2;
-			}
+			/*------
+			 * We have found two similar clauses, such as
+			 * x > y AND x > z.
+			 * Keep only the more restrictive one.
+			 *------
+			 */
+			if (cselem->hibound > s2)
+				cselem->hibound = s2;
 		}
-		return;
 	}
+}
 
-	/* No matching var found, so make a new clause-pair data structure */
-	rqelem = (RangeQueryClause *) palloc(sizeof(RangeQueryClause));
-	rqelem->var = var;
-	if (is_lobound)
-	{
-		rqelem->have_lobound = true;
-		rqelem->have_hibound = false;
-		rqelem->lobound = s2;
-	}
+/*
+ * addCachedSelectivityOtherClause
+ *		Cache the selectivity of other OpExpr type expressions
+ */
+static void
+addCachedSelectivityOtherClause(CachedSelectivityClause **cslist, Node *var,
+			   Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, var);
+
+	if ((cselem->selmask & CACHESEL_OTHER))
+		cselem->othersel = cselem->othersel * s2;
 	else
 	{
-		rqelem->have_lobound = false;
-		rqelem->have_hibound = true;
-		rqelem->hibound = s2;
+		cselem->othersel = s2;
+		cselem->selmask |= CACHESEL_OTHER;
 	}
-	rqelem->next = *rqlist;
-	*rqlist = rqelem;
 }
 
 /*
#2Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#1)
Re: Making clausesel.c Smarter

On Fri, Feb 24, 2017 at 3:30 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

It would be good to improve the situation here in the back branches
too, but I'm thinking that the attached might be a little invasive for
that?

My experience has been that customers - at least EnterpriseDB
customers - do not appreciate plan changes when they upgrade to a new
minor release. Most people have production installations that are
basically working; if not, they wouldn't be in production. Even if
they're getting a good plan only by some happy accident, they're still
getting it, and a change can cause a good plan to flop over into a bad
plan, which can easily turn into a service outage. The people who had
a bad plan and get flipped to a good plan will be happy once they
realize what has changed, of course, but that's not really enough to
make up from the panicked calls from customers whose stuff falls over
when they try to apply the critical security update.

I think the basic think you are trying to accomplish is sensible,
though. I haven't reviewed the patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3David Steele
david@pgmasters.net
In reply to: Robert Haas (#2)
Re: Making clausesel.c Smarter

On 2/26/17 1:41 AM, Robert Haas wrote:

On Fri, Feb 24, 2017 at 3:30 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

It would be good to improve the situation here in the back branches
too, but I'm thinking that the attached might be a little invasive for
that?

My experience has been that customers - at least EnterpriseDB
customers - do not appreciate plan changes when they upgrade to a new
minor release. Most people have production installations that are
basically working; if not, they wouldn't be in production. Even if
they're getting a good plan only by some happy accident, they're still
getting it, and a change can cause a good plan to flop over into a bad
plan, which can easily turn into a service outage. The people who had
a bad plan and get flipped to a good plan will be happy once they
realize what has changed, of course, but that's not really enough to
make up from the panicked calls from customers whose stuff falls over
when they try to apply the critical security update.

I think the basic think you are trying to accomplish is sensible,
though. I haven't reviewed the patch.

This patch applies cleanly and compiles at cccbdde. I agree with Robert
that back patching would likely not be a good idea.

Anyone familiar with the planner available to review this patch?

It also seems like this would be a (relatively) simple patch to start
with for anyone that is interested in learning more about the planner.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Steele (#3)
Re: Making clausesel.c Smarter

David Steele <david@pgmasters.net> writes:

Anyone familiar with the planner available to review this patch?

FWIW, it's on my radar but I don't expect to get to it real soon,
as there's other stuff I deem higher priority. In the meantime,
I don't want to stand in the way of someone else looking at it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Claudio Freire
klaussfreire@gmail.com
In reply to: David Rowley (#1)
Re: Making clausesel.c Smarter

On Fri, Feb 24, 2017 at 7:00 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

I've stumbled over an interesting query out in the wild where the
query was testing a nullable timestamptz column was earlier than some
point in time, and also checking the column IS NOT NULL.

The use pattern here was that records which required processing had
this timestamp set to something other than NULL, a worker would come
along and process those, and UPDATE the record to NULL to mark the
fact that it was now processed. So what we are left with was a table
with a small number of rows with a value in this timestamp column, and
an ever-growing number of rows with a NULL value.

A highly simplified version of the query was checking for records that
required processing before some date, say:

SELECT * FROM a WHERE ts < '2017-02-24' AND ts IS NOT NULL;

Of course, the ts IS NOT NULL is not required here, but I can
understand how someone might make the mistake of adding this. The
simple solution to the problem was to have that null test removed, but
that seemingly innocent little mistake caused some pain due to very
slow running queries which held on to a nested loop plan 33 times
longer than it should have been doing. Basically what was happening
here is that clauselist_selectivity() was applying the selectivity
with the ~0.97 null_fact from pg_statistic over the top of the already
applied estimate on the number of rows below the constant timestamp.

Since the diagnosis of this problem was not instant, and some amount
of pain was suffered before the solution was found, I wondered if it
might be worth smartening up the planner a bit in this area.

We're already pretty good at handling conditions like: SELECT * FROM a
WHERE x < 10 and x < 1; where we'll effectively ignore the x < 10
estimate since x < 1 is more restrictive, so if we just build on that
ability a bit we could get enough code to cover the above case.

I've attached a draft patch which improves the situation here.

I thought to take a quick look at this patch. I'll probably take a
deeper look later, but some initial comments:

-typedef struct RangeQueryClause
+typedef struct CachedSelectivityClause
 {
-    struct RangeQueryClause *next;        /* next in linked list */
+    struct CachedSelectivityClause *next;        /* next in linked list */
     Node       *var;            /* The common variable of the clauses */
-    bool        have_lobound;    /* found a low-bound clause yet? */
-    bool        have_hibound;    /* found a high-bound clause yet? */
+    int            selmask;        /* Bitmask of which sel types are stored */
     Selectivity lobound;        /* Selectivity of a var > something clause */
     Selectivity hibound;        /* Selectivity of a var < something clause */
-} RangeQueryClause;

As seems customary in other code, perhaps you should define some
HAS_X() macros for dealing with the selmask.

Something like

#SELMASK_HAS_LOBOUND(selmask) (((selmask) & CACHESEL_LOBOUND) != 0)
etc..

+static void addCachedSelectivityRangeVar(CachedSelectivityClause
**cslist, Node *var,
bool varonleft, bool isLTsel, Selectivity s2);

You're changing clause -> var throughout the code when dealing with
nodes, but looking at their use, they hold neither. All those
addCachedSelectivity functions are usually passed expression nodes. If
you're renaming, perhaps you should rename to "expr".

On Fri, Feb 24, 2017 at 7:00 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

Now one thing I was unsure of in the patch was this "Other clauses"
concept that I've come up with. In the patch we have:

default:
addCachedSelectivityOtherClause(&cslist, var, s2);
break;

I was unsure if this should only apply to F_EQSEL and F_NEQSEL. My
imagination here won't stretch far enough to imagine a OpExpr which
passes with a NULL operand. If it did my logic is broken, and if
that's the case then I think limiting "Others" to mean F_EQSEL and
F_NEQSEL would be the workaround.

While not specifically applicable only to "Others", something needs
consideration here:

User-defined functions can be nonstrict. An OpExpr involving such a
user-defined function could possibly pass on null.

I would suggest avoiding making the assumption that it doesn't unless
the operator is strict.

One could argue that such an operator should provide its own
selectivity estimator, but the strictness check shouldn't be too
difficult to add, and I think it's a better choice.

So you'd have to check that:

default:
if (op_strict(expr->opno) && func_strict(expr->opfuncid))
addCachedSelectivityOtherClause(&cslist, var, s2);
else
s1 = s1 * s2;
break;

So, I went ahead and did that.

Considering this setup:

createdb pgtest
cat <<SQL | psql pgtest
CREATE TABLE testdata AS select null::integer as value from
generate_series(1, 100000);
VACUUM FREEZE testdata;
CREATE FUNCTION maskcheck(integer, integer) RETURNS boolean AS \$\$
BEGIN RETURN (coalesce(\$1,0) & coalesce(\$2,0)) = 0 ; END \$\$
LANGUAGE PLPGSQL;
CREATE OPERATOR && ( PROCEDURE = maskcheck, LEFTARG = integer,
RIGHTARG = integer, COMMUTATOR = && );
SQL

The following query:

EXPLAIN ANALYZE SELECT * FROM testdata WHERE value && 1 AND value is null;

The original patch yields:

QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on testdata (cost=0.00..26344.00 rows=1 width=4) (actual
time=0.367..84.861 rows=100000 loops=1)
Filter: ((value IS NULL) AND (value && 1))
Planning time: 0.371 ms
Execution time: 88.156 ms
(4 rows)

After the change, however, it yields:

QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on testdata (cost=0.00..26344.00 rows=50000 width=4)
(actual time=0.207..89.668 rows=100000 loops=1)
Filter: ((value IS NULL) AND (value && 1))
Planning time: 0.186 ms
Execution time: 92.978 ms
(4 rows)

Which seems much better.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6David Rowley
david.rowley@2ndquadrant.com
In reply to: Claudio Freire (#5)
1 attachment(s)
Re: Making clausesel.c Smarter

On 1 April 2017 at 16:42, Claudio Freire <klaussfreire@gmail.com> wrote:

I thought to take a quick look at this patch. I'll probably take a
deeper look later, but some initial comments:

-typedef struct RangeQueryClause
+typedef struct CachedSelectivityClause
{
-    struct RangeQueryClause *next;        /* next in linked list */
+    struct CachedSelectivityClause *next;        /* next in linked list */
Node       *var;            /* The common variable of the clauses */
-    bool        have_lobound;    /* found a low-bound clause yet? */
-    bool        have_hibound;    /* found a high-bound clause yet? */
+    int            selmask;        /* Bitmask of which sel types are stored */
Selectivity lobound;        /* Selectivity of a var > something clause */
Selectivity hibound;        /* Selectivity of a var < something clause */
-} RangeQueryClause;

As seems customary in other code, perhaps you should define some
HAS_X() macros for dealing with the selmask.

Something like

#SELMASK_HAS_LOBOUND(selmask) (((selmask) & CACHESEL_LOBOUND) != 0)
etc..

Thanks for looking at the patch.

The problem with that is that some of the tests are doing more than
just checking a single flag is enabled.

Consider:

if ((cslist->selmask & (CACHESEL_LOBOUND | CACHESEL_HIBOUND)) ==
(CACHESEL_LOBOUND | CACHESEL_HIBOUND))

Of course, those could be Macro'd too, but it seems I might need to be
inventive with the names, or the meaning might not be all that clear.

It does not seem overly complex and it is isolated to a single file,
so perhaps it might be OK as is?

+static void addCachedSelectivityRangeVar(CachedSelectivityClause
**cslist, Node *var,
bool varonleft, bool isLTsel, Selectivity s2);

You're changing clause -> var throughout the code when dealing with
nodes, but looking at their use, they hold neither. All those
addCachedSelectivity functions are usually passed expression nodes. If
you're renaming, perhaps you should rename to "expr".

hmm, this is true. I kind of followed the lead of the name of the
variable in the old RangeQueryClause struct. I have changed the names
of these to be expr in the attached, but I didn't change the name of
the Var in the new CachedSelectivityClause struct. It seems like a
change not related to this patch.

Do you think I should change that too?

On Fri, Feb 24, 2017 at 7:00 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

Now one thing I was unsure of in the patch was this "Other clauses"
concept that I've come up with. In the patch we have:

default:
addCachedSelectivityOtherClause(&cslist, var, s2);
break;

I was unsure if this should only apply to F_EQSEL and F_NEQSEL. My
imagination here won't stretch far enough to imagine a OpExpr which
passes with a NULL operand. If it did my logic is broken, and if
that's the case then I think limiting "Others" to mean F_EQSEL and
F_NEQSEL would be the workaround.

While not specifically applicable only to "Others", something needs
consideration here:

User-defined functions can be nonstrict. An OpExpr involving such a
user-defined function could possibly pass on null.

Good point. I overlooked this.

I would suggest avoiding making the assumption that it doesn't unless
the operator is strict.

One could argue that such an operator should provide its own
selectivity estimator, but the strictness check shouldn't be too
difficult to add, and I think it's a better choice.

So you'd have to check that:

default:
if (op_strict(expr->opno) && func_strict(expr->opfuncid))
addCachedSelectivityOtherClause(&cslist, var, s2);
else
s1 = s1 * s2;
break;

I've changed it to something along those lines. I don't think the
func_strict here is required, though, so I've gone with just
op_strict().

Updated patch attached.

Thanks for reviewing it.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

smarter_clausesel_2017-04-03.patchapplication/octet-stream; name=smarter_clausesel_2017-04-03.patchDownload
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index af2934a..7453329 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -23,23 +23,39 @@
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
+#define CACHESEL_LOBOUND		0x0001	/* has var > something selectivity */
+#define CACHESEL_HIBOUND		0x0002	/* has var < something selectivity */
+#define CACHESEL_NULLTEST		0x0004	/* has var IS NULL selectivity */
+#define CACHESEL_NOTNULLTEST	0x0008	/* has var IS NOT NULL selectivity */
+#define CACHESEL_OTHERSTRICT	0x0010	/* has another OpExpr selectivity */
 
 /*
- * Data structure for accumulating info about possible range-query
- * clause pairs in clauselist_selectivity.
+ * Data structure for caching selectivity for clauselist_selectivity.
  */
-typedef struct RangeQueryClause
+typedef struct CachedSelectivityClause
 {
-	struct RangeQueryClause *next;		/* next in linked list */
+	struct CachedSelectivityClause *next;		/* next in linked list */
 	Node	   *var;			/* The common variable of the clauses */
-	bool		have_lobound;	/* found a low-bound clause yet? */
-	bool		have_hibound;	/* found a high-bound clause yet? */
+	int			selmask;		/* Bitmask of which sel types are stored */
 	Selectivity lobound;		/* Selectivity of a var > something clause */
 	Selectivity hibound;		/* Selectivity of a var < something clause */
-} RangeQueryClause;
+	Selectivity nullsel;		/* Selectivity of a IS NULL test */
+	Selectivity notnullsel;		/* Selectivity of a IS NOT NULL test */
+	Selectivity otherstrictsel; /* Selectivity of any other strict clauses */
+} CachedSelectivityClause;
 
-static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
-			   bool varonleft, bool isLTsel, Selectivity s2);
+
+static CachedSelectivityClause *findCachedSelectivityVar(
+						 CachedSelectivityClause **cslist, Node *expr);
+static void addCachedSelectivityNullTest(CachedSelectivityClause **cslist,
+							 Node *expr, Selectivity s2);
+static void addCachedSelectivityNotNullTest(CachedSelectivityClause **cslist,
+								Node *expr, Selectivity s2);
+static void addCachedSelectivityRangeVar(CachedSelectivityClause **cslist,
+				   Node *expr, bool varonleft, bool isLTsel, Selectivity s2);
+static void addCachedSelectivityOtherStrictClause(
+									  CachedSelectivityClause **cslist,
+									  Node *expr, Selectivity s2);
 
 
 /****************************************************************************
@@ -60,27 +76,38 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
- * Currently, the only extra smarts we have is to recognize "range queries",
- * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
- * query components if they are restriction opclauses whose operators have
+ *
+ * Currently we only handle OpExprs with 2 operands, and IS [NOT] NULL tests.
+ * We have extra smarts to recognize "range queries", such as
+ * "x > 34 AND x < 42".  These clauses are recognized as possible range query
+ * components if they are restriction opclauses whose operators have
  * scalarltsel() or scalargtsel() as their restriction selectivity estimator.
- * We pair up clauses of this form that refer to the same variable.  An
- * unpairable clause of this kind is simply multiplied into the selectivity
- * product in the normal way.  But when we find a pair, we know that the
- * selectivities represent the relative positions of the low and high bounds
- * within the column's range, so instead of figuring the selectivity as
- * hisel * losel, we can figure it as hisel + losel - 1.  (To visualize this,
- * see that hisel is the fraction of the range below the high bound, while
- * losel is the fraction above the low bound; so hisel can be interpreted
- * directly as a 0..1 value but we need to convert losel to 1-losel before
- * interpreting it as a value.  Then the available range is 1-losel to hisel.
- * However, this calculation double-excludes nulls, so really we need
- * hisel + losel + null_frac - 1.)
+ * IS [NOT] NULL tests are handled too, it's a common enough, and perhaps
+ * innocent enough looking mistake to include a IS NOT NULL test along with
+ * another condition for the same var in a query, so its useful in this case
+ * to treat the "IS NOT NULL" as a no-op. However, since we only collect
+ * OpExprs and NullTests, we're only going to detect this NullTest issue in
+ * the cases where the other found qual is an OpExpr.
+ *
+ * We collect up clauses so that we're able to conditionally apply each
+ * selectivity based on the other clauses which have been seen for the same
+ * Var.  Any clause which is unpaired is simply multiplied into the
+ * selectivity product in the normal way.  But when we find a pair, we can
+ * conditionally apply the selectivities.  This also allows us to detect weird
+ * corner cases such as "x = 1 AND x IS NULL".  With range pairs, we know that
+ * the selectivities represent the relative positions of the low and high
+ * bounds within the column's range, so instead of figuring the selectivity
+ * as hisel * losel, we can figure it as hisel + losel - 1.  (To visualize
+ * this, see that hisel is the fraction of the range below the high bound,
+ * while losel is the fraction above the low bound; so hisel can be
+ * interpreted directly as a 0..1 value but we need to convert losel to
+ * 1-losel before interpreting it as a value.  Then the available range is
+ * 1-losel to hisel. However, this calculation double-excludes nulls, so
+ * really we need hisel + losel + null_frac - 1.)
  *
- * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
- * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
- * yields an impossible (negative) result.
+ * For range pairs, if either selectivity is exactly DEFAULT_INEQ_SEL, we
+ * forget this equation and instead use DEFAULT_RANGE_INEQ_SEL.  The same
+ * applies if the equation yields an impossible (negative) result.
  *
  * A free side-effect is that we can recognize redundant inequalities such
  * as "x < 4 AND x < 5"; only the tighter constraint will be counted.
@@ -96,7 +123,7 @@ clauselist_selectivity(PlannerInfo *root,
 					   SpecialJoinInfo *sjinfo)
 {
 	Selectivity s1 = 1.0;
-	RangeQueryClause *rqlist = NULL;
+	CachedSelectivityClause *cslist = NULL;
 	ListCell   *l;
 
 	/*
@@ -140,6 +167,20 @@ clauselist_selectivity(PlannerInfo *root,
 		else
 			rinfo = NULL;
 
+		if (IsA(clause, NullTest))
+		{
+			NullTestType nulltesttype = ((NullTest *) clause)->nulltesttype;
+
+			if (nulltesttype == IS_NULL)
+				addCachedSelectivityNullTest(&cslist,
+									(Node *) ((NullTest *) clause)->arg, s2);
+			else if (nulltesttype == IS_NOT_NULL)
+				addCachedSelectivityNotNullTest(&cslist,
+									(Node *) ((NullTest *) clause)->arg, s2);
+
+			continue;			/* drop to loop bottom */
+		}
+
 		/*
 		 * See if it looks like a restriction clause with a pseudoconstant on
 		 * one side.  (Anything more complicated than that might not behave in
@@ -151,26 +192,35 @@ clauselist_selectivity(PlannerInfo *root,
 			OpExpr	   *expr = (OpExpr *) clause;
 			bool		varonleft = true;
 			bool		ok;
+			Node	   *leftexpr = (Node *) linitial(expr->args);
+			Node	   *rightexpr = (Node *) lsecond(expr->args);
 
 			if (rinfo)
 			{
 				ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
-					(is_pseudo_constant_clause_relids(lsecond(expr->args),
+					(is_pseudo_constant_clause_relids(rightexpr,
 													  rinfo->right_relids) ||
 					 (varonleft = false,
-					  is_pseudo_constant_clause_relids(linitial(expr->args),
+					  is_pseudo_constant_clause_relids(leftexpr,
 													   rinfo->left_relids)));
 			}
 			else
 			{
 				ok = (NumRelids(clause) == 1) &&
-					(is_pseudo_constant_clause(lsecond(expr->args)) ||
+					(is_pseudo_constant_clause(rightexpr) ||
 					 (varonleft = false,
-					  is_pseudo_constant_clause(linitial(expr->args))));
+					  is_pseudo_constant_clause(leftexpr)));
 			}
 
 			if (ok)
 			{
+				Node	   *var;
+
+				if (varonleft)
+					var = leftexpr;
+				else
+					var = rightexpr;
+
 				/*
 				 * If it's not a "<" or ">" operator, just merge the
 				 * selectivity in generically.  But if it's the right oprrest,
@@ -179,16 +229,25 @@ clauselist_selectivity(PlannerInfo *root,
 				switch (get_oprrest(expr->opno))
 				{
 					case F_SCALARLTSEL:
-						addRangeClause(&rqlist, clause,
-									   varonleft, true, s2);
+						addCachedSelectivityRangeVar(&cslist, var,
+													 varonleft, true, s2);
 						break;
 					case F_SCALARGTSEL:
-						addRangeClause(&rqlist, clause,
-									   varonleft, false, s2);
+						addCachedSelectivityRangeVar(&cslist, var,
+													 varonleft, false, s2);
 						break;
 					default:
-						/* Just merge the selectivity in generically */
-						s1 = s1 * s2;
+
+						/*
+						 * Cache all strict clauses in selectivity other.
+						 * Anything non-strict we'll just apply the
+						 * selectivity now, since we're currently unable to do
+						 * anything particularly smart with it.
+						 */
+						if (op_strict(expr->opno))
+							addCachedSelectivityOtherStrictClause(&cslist, var, s2);
+						else
+							s1 = s1 * s2;
 						break;
 				}
 				continue;		/* drop to loop bottom */
@@ -200,176 +259,259 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
-	 * Now scan the rangequery pair list.
+	 * Now scan the cached selectivity list
 	 */
-	while (rqlist != NULL)
+	while (cslist != NULL)
 	{
-		RangeQueryClause *rqnext;
+		CachedSelectivityClause *csnext;
 
-		if (rqlist->have_lobound && rqlist->have_hibound)
+		if ((cslist->selmask & CACHESEL_NULLTEST))
 		{
-			/* Successfully matched a pair of range clauses */
-			Selectivity s2;
-
 			/*
-			 * Exact equality to the default value probably means the
-			 * selectivity function punted.  This is not airtight but should
-			 * be good enough.
+			 * if null test is not the only flag then there can be no matching
+			 * rows at all.
 			 */
-			if (rqlist->hibound == DEFAULT_INEQ_SEL ||
-				rqlist->lobound == DEFAULT_INEQ_SEL)
+			if (cslist->selmask != CACHESEL_NULLTEST)
 			{
-				s2 = DEFAULT_RANGE_INEQ_SEL;
+				s1 = 0;
+				break;			/* nothing more needs estimated */
 			}
 			else
-			{
-				s2 = rqlist->hibound + rqlist->lobound - 1.0;
+				s1 *= cslist->nullsel;
+		}
 
-				/* Adjust for double-exclusion of NULLs */
-				s2 += nulltestsel(root, IS_NULL, rqlist->var,
-								  varRelid, jointype, sjinfo);
+		/*
+		 * An IS NOT NULL test is a no-op if there's any other strict quals,
+		 * so if that's the case, then we'll only apply this, otherwise we'll
+		 * ignore it.
+		 */
+		else if (cslist->selmask == CACHESEL_NOTNULLTEST)
+			s1 *= cslist->notnullsel;
+
+		else
+		{
+			/* Check if both lobound and hibound were seen */
+			if ((cslist->selmask & (CACHESEL_LOBOUND | CACHESEL_HIBOUND)) ==
+				(CACHESEL_LOBOUND | CACHESEL_HIBOUND))
+			{
+				/* Successfully matched a pair of range clauses */
+				Selectivity s2;
 
 				/*
-				 * A zero or slightly negative s2 should be converted into a
-				 * small positive value; we probably are dealing with a very
-				 * tight range and got a bogus result due to roundoff errors.
-				 * However, if s2 is very negative, then we probably have
-				 * default selectivity estimates on one or both sides of the
-				 * range that we failed to recognize above for some reason.
+				 * Exact equality to the default value probably means the
+				 * selectivity function punted.  This is not airtight but
+				 * should be good enough.
 				 */
-				if (s2 <= 0.0)
+				if (cslist->hibound == DEFAULT_INEQ_SEL ||
+					cslist->lobound == DEFAULT_INEQ_SEL)
 				{
-					if (s2 < -0.01)
-					{
-						/*
-						 * No data available --- use a default estimate that
-						 * is small, but not real small.
-						 */
-						s2 = DEFAULT_RANGE_INEQ_SEL;
-					}
-					else
+					s2 = DEFAULT_RANGE_INEQ_SEL;
+				}
+				else
+				{
+					s2 = cslist->hibound + cslist->lobound - 1.0;
+
+					/* Adjust for double-exclusion of NULLs */
+					s2 += nulltestsel(root, IS_NULL, cslist->var,
+									  varRelid, jointype, sjinfo);
+
+					/*
+					 * A zero or slightly negative s2 should be converted into
+					 * a small positive value; we probably are dealing with a
+					 * very tight range and got a bogus result due to roundoff
+					 * errors. However, if s2 is very negative, then we
+					 * probably have default selectivity estimates on one or
+					 * both sides of the range that we failed to recognize
+					 * above for some reason.
+					 */
+					if (s2 <= 0.0)
 					{
-						/*
-						 * It's just roundoff error; use a small positive
-						 * value
-						 */
-						s2 = 1.0e-10;
+						if (s2 < -0.01)
+						{
+							/*
+							 * No data available --- use a default estimate
+							 * that is small, but not real small.
+							 */
+							s2 = DEFAULT_RANGE_INEQ_SEL;
+						}
+						else
+						{
+							/*
+							 * It's just roundoff error; use a small positive
+							 * value
+							 */
+							s2 = 1.0e-10;
+						}
 					}
 				}
+				/* Merge in the selectivity of the pair of clauses */
+				s1 *= s2;
 			}
-			/* Merge in the selectivity of the pair of clauses */
-			s1 *= s2;
-		}
-		else
-		{
-			/* Only found one of a pair, merge it in generically */
-			if (rqlist->have_lobound)
-				s1 *= rqlist->lobound;
 			else
-				s1 *= rqlist->hibound;
+			{
+				/* Only found one of a range pair, merge it in generically */
+				if ((cslist->selmask & CACHESEL_LOBOUND))
+					s1 *= cslist->lobound;
+				else if ((cslist->selmask & CACHESEL_HIBOUND))
+					s1 *= cslist->hibound;
+			}
+
+			/* apply the selectivity for any other seen strict qual */
+			if ((cslist->selmask & CACHESEL_OTHERSTRICT))
+				s1 *= cslist->otherstrictsel;
 		}
+
 		/* release storage and advance */
-		rqnext = rqlist->next;
-		pfree(rqlist);
-		rqlist = rqnext;
+		csnext = cslist->next;
+		pfree(cslist);
+		cslist = csnext;
 	}
 
 	return s1;
 }
 
 /*
- * addRangeClause --- add a new range clause for clauselist_selectivity
+ * findCachedSelectivityVar
+ *	Find existing seletivity var, or add this var to the list.
+ */
+static CachedSelectivityClause *
+findCachedSelectivityVar(CachedSelectivityClause **cslist, Node *expr)
+{
+	CachedSelectivityClause *cselem;
+
+	for (cselem = *cslist; cselem; cselem = cselem->next)
+	{
+		/*
+		 * We use full equal() here because the "var" might be a function of
+		 * one or more attributes of the same relation...
+		 */
+		if (equal(expr, cselem->var))
+			return cselem;
+	}
+
+	/* not found -- add it */
+	cselem = (CachedSelectivityClause *) palloc(sizeof(CachedSelectivityClause));
+	cselem->var = expr;
+	cselem->selmask = 0;
+
+	cselem->next = *cslist;
+	*cslist = cselem;
+	return cselem;
+}
+
+
+/*
+ * addCachedSelectivityNullTest
+ *		Cache selectivity for an IS NULL test.
+ */
+static void
+addCachedSelectivityNullTest(CachedSelectivityClause **cslist, Node *expr,
+							 Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	/* We can simply overwrite any previously cached selectivity here */
+	cselem->nullsel = s2;
+	cselem->selmask |= CACHESEL_NULLTEST;
+}
+
+/*
+ * addCachedSelectivityNotNullTest
+ *		Cache selectivity for an IS NOT NULL test.
+ */
+static void
+addCachedSelectivityNotNullTest(CachedSelectivityClause **cslist, Node *expr,
+								Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	/* We can simply overwrite any previously cached selectivity here */
+	cselem->notnullsel = s2;
+	cselem->selmask |= CACHESEL_NOTNULLTEST;
+}
+
+/*
+ * addCachedSelectivityRangeVar
+ *		add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
  */
 static void
-addRangeClause(RangeQueryClause **rqlist, Node *clause,
-			   bool varonleft, bool isLTsel, Selectivity s2)
+addCachedSelectivityRangeVar(CachedSelectivityClause **cslist, Node *expr,
+							 bool varonleft, bool isLTsel, Selectivity s2)
 {
-	RangeQueryClause *rqelem;
-	Node	   *var;
+	CachedSelectivityClause *cselem;
 	bool		is_lobound;
 
-	if (varonleft)
+	is_lobound = (varonleft != isLTsel);
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	if (is_lobound)
 	{
-		var = get_leftop((Expr *) clause);
-		is_lobound = !isLTsel;	/* x < something is high bound */
+		if (!(cselem->selmask & CACHESEL_LOBOUND))
+		{
+			cselem->selmask |= CACHESEL_LOBOUND;
+			cselem->lobound = s2;
+		}
+		else
+		{
+			/*------
+			 * We have found two similar clauses, such as
+			 * x < y AND x < z.
+			 * Keep only the more restrictive one.
+			 *------
+			 */
+			if (cselem->lobound > s2)
+				cselem->lobound = s2;
+		}
 	}
 	else
 	{
-		var = get_rightop((Expr *) clause);
-		is_lobound = isLTsel;	/* something < x is low bound */
-	}
-
-	for (rqelem = *rqlist; rqelem; rqelem = rqelem->next)
-	{
-		/*
-		 * We use full equal() here because the "var" might be a function of
-		 * one or more attributes of the same relation...
-		 */
-		if (!equal(var, rqelem->var))
-			continue;
-		/* Found the right group to put this clause in */
-		if (is_lobound)
+		if (!(cselem->selmask & CACHESEL_HIBOUND))
 		{
-			if (!rqelem->have_lobound)
-			{
-				rqelem->have_lobound = true;
-				rqelem->lobound = s2;
-			}
-			else
-			{
-
-				/*------
-				 * We have found two similar clauses, such as
-				 * x < y AND x < z.
-				 * Keep only the more restrictive one.
-				 *------
-				 */
-				if (rqelem->lobound > s2)
-					rqelem->lobound = s2;
-			}
+			cselem->selmask |= CACHESEL_HIBOUND;
+			cselem->hibound = s2;
 		}
 		else
 		{
-			if (!rqelem->have_hibound)
-			{
-				rqelem->have_hibound = true;
-				rqelem->hibound = s2;
-			}
-			else
-			{
 
-				/*------
-				 * We have found two similar clauses, such as
-				 * x > y AND x > z.
-				 * Keep only the more restrictive one.
-				 *------
-				 */
-				if (rqelem->hibound > s2)
-					rqelem->hibound = s2;
-			}
+			/*------
+			 * We have found two similar clauses, such as
+			 * x > y AND x > z.
+			 * Keep only the more restrictive one.
+			 *------
+			 */
+			if (cselem->hibound > s2)
+				cselem->hibound = s2;
 		}
-		return;
 	}
+}
 
-	/* No matching var found, so make a new clause-pair data structure */
-	rqelem = (RangeQueryClause *) palloc(sizeof(RangeQueryClause));
-	rqelem->var = var;
-	if (is_lobound)
-	{
-		rqelem->have_lobound = true;
-		rqelem->have_hibound = false;
-		rqelem->lobound = s2;
-	}
+/*
+ * addCachedSelectivityOtherStrictClause
+ *		Cache the selectivity of other OpExpr type expressions which are
+ *		strict
+ */
+static void
+addCachedSelectivityOtherStrictClause(CachedSelectivityClause **cslist, Node *expr,
+									  Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	if ((cselem->selmask & CACHESEL_OTHERSTRICT))
+		cselem->otherstrictsel = cselem->otherstrictsel * s2;
 	else
 	{
-		rqelem->have_lobound = false;
-		rqelem->have_hibound = true;
-		rqelem->hibound = s2;
+		cselem->otherstrictsel = s2;
+		cselem->selmask |= CACHESEL_OTHERSTRICT;
 	}
-	rqelem->next = *rqlist;
-	*rqlist = rqelem;
 }
 
 /*
#7Andres Freund
andres@anarazel.de
In reply to: David Rowley (#6)
Re: Making clausesel.c Smarter

On 2017-04-03 20:59:42 +1200, David Rowley wrote:

Updated patch attached.

Thanks for reviewing it.

Given the time in the release cycle I'm afraid that this it's too late
to get this into v10. Does anybody disagree? If not, it should be
moved to the next CF.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Claudio Freire
klaussfreire@gmail.com
In reply to: Andres Freund (#7)
Re: Making clausesel.c Smarter

On Mon, Apr 3, 2017 at 5:24 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-04-03 20:59:42 +1200, David Rowley wrote:

Updated patch attached.

Thanks for reviewing it.

Given the time in the release cycle I'm afraid that this it's too late
to get this into v10. Does anybody disagree? If not, it should be
moved to the next CF.

Greetings,

Andres Freund

While personally I'd like to see this patch make it (I got bitten by
this more than once), I can't argue with that logic with the feature
freeze almost upon us, especially given the stage the review is at
(ie: just beginning).

I'm still going to review it though.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9David Rowley
david.rowley@2ndquadrant.com
In reply to: Andres Freund (#7)
Re: Making clausesel.c Smarter

On 4 April 2017 at 08:24, Andres Freund <andres@anarazel.de> wrote:

On 2017-04-03 20:59:42 +1200, David Rowley wrote:

Updated patch attached.

Thanks for reviewing it.

Given the time in the release cycle I'm afraid that this it's too late
to get this into v10. Does anybody disagree? If not, it should be
moved to the next CF.

I strongly disagree. The time in the release cycle is the final
commitfest. This is when patches are commited to the repository. The
exception to this is that no large invasive patches should arrive new
in the final commitfest. This is not one of those.

Tom has already mentioned he'd like to look at this. If you're not
willing, then please just don't look at it, but please also don't
remove other peoples opportunity for doing so.

Please explain your logic for thinking otherwise. Have I somehow
misunderstood what the 1 week extension means?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@anarazel.de
In reply to: David Rowley (#9)
Re: Making clausesel.c Smarter

On 2017-04-04 09:22:07 +1200, David Rowley wrote:

On 4 April 2017 at 08:24, Andres Freund <andres@anarazel.de> wrote:

On 2017-04-03 20:59:42 +1200, David Rowley wrote:

Updated patch attached.

Thanks for reviewing it.

Given the time in the release cycle I'm afraid that this it's too late
to get this into v10. Does anybody disagree? If not, it should be
moved to the next CF.

I strongly disagree. The time in the release cycle is the final
commitfest. This is when patches are commited to the repository. The
exception to this is that no large invasive patches should arrive new
in the final commitfest. This is not one of those.

Right.

Tom has already mentioned he'd like to look at this.

He also said that it'll be a while, and he hadn't yet by the time of the
original freeze date.

Please explain your logic for thinking otherwise. Have I somehow
misunderstood what the 1 week extension means?

Well, we have a lot of pending work, some of that has been pending for
years essentially. And Tom hadn't looked at it yet, and he wasn't at
pgconf.us, so he wasn't actually delayed due to that...

I'm fine with not punting right now, but we need to reduce the number of
patches and focus on stuff that's realistic to get in. There's 4 days
and ~60 open CF entries. Please help whip other patches into shape...

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Claudio Freire
klaussfreire@gmail.com
In reply to: David Rowley (#9)
Re: Making clausesel.c Smarter

On Mon, Apr 3, 2017 at 6:22 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

On 4 April 2017 at 08:24, Andres Freund <andres@anarazel.de> wrote:

On 2017-04-03 20:59:42 +1200, David Rowley wrote:

Updated patch attached.

Thanks for reviewing it.

Given the time in the release cycle I'm afraid that this it's too late
to get this into v10. Does anybody disagree? If not, it should be
moved to the next CF.

I strongly disagree. The time in the release cycle is the final
commitfest. This is when patches are commited to the repository. The
exception to this is that no large invasive patches should arrive new
in the final commitfest. This is not one of those.

Selectivity misestimations can have a huge impact on query
performance, and that could be a big deal if there's a big regression
on some queries.

But, the beta period could be enough to detect any issues there. I'll
leave that decision to committers or commitfest managers I guess.

Tom has already mentioned he'd like to look at this. If you're not
willing, then please just don't look at it, but please also don't
remove other peoples opportunity for doing so.

Please explain your logic for thinking otherwise. Have I somehow
misunderstood what the 1 week extension means?

But Tom hasn't yet, and even if he does, we'd have to be done with the
review within this week.

In any case, I intend to go on with the review later today, and at
first look the patch seems in good shape (though it'll take a deeper
look before I make that my official review). Most other patches in
need of review are a bit complex for my knowledge of the code base,
and I already started reviewing this one. We could perhaps be done by
the end of the week.

We could defer that decision to within 4 days. If we're done, we're
done. If we're not, move to next CF.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Claudio Freire
klaussfreire@gmail.com
In reply to: David Rowley (#6)
Re: Making clausesel.c Smarter

On Mon, Apr 3, 2017 at 5:59 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

+static void addCachedSelectivityRangeVar(CachedSelectivityClause
**cslist, Node *var,
bool varonleft, bool isLTsel, Selectivity s2);

You're changing clause -> var throughout the code when dealing with
nodes, but looking at their use, they hold neither. All those
addCachedSelectivity functions are usually passed expression nodes. If
you're renaming, perhaps you should rename to "expr".

hmm, this is true. I kind of followed the lead of the name of the
variable in the old RangeQueryClause struct. I have changed the names
of these to be expr in the attached, but I didn't change the name of
the Var in the new CachedSelectivityClause struct. It seems like a
change not related to this patch.

Do you think I should change that too?

I'd prefer it if all occurrences of the concept were changed, to
maintain consistency.
That would include variables used to hold expressions that refer to
these as well, as in the case of:

+                Node       *var;
+
+                if (varonleft)
+                    var = leftexpr;
+                else
+                    var = rightexpr;
+

I would suggest avoiding making the assumption that it doesn't unless
the operator is strict.

One could argue that such an operator should provide its own
selectivity estimator, but the strictness check shouldn't be too
difficult to add, and I think it's a better choice.

So you'd have to check that:

default:
if (op_strict(expr->opno) && func_strict(expr->opfuncid))
addCachedSelectivityOtherClause(&cslist, var, s2);
else
s1 = s1 * s2;
break;

I've changed it to something along those lines. I don't think the
func_strict here is required, though, so I've gone with just
op_strict().

Indeed, I realized right after typing it, I thought I had corrected it
before I hit send. Seems I hadn't. Good thing you noticed.

One last observation:

+        /*
+         * An IS NOT NULL test is a no-op if there's any other strict quals,
+         * so if that's the case, then we'll only apply this, otherwise we'll
+         * ignore it.
+         */
+        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
+            s1 *= cslist->notnullsel;

In some paths, the range selectivity estimation code punts and returns
DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
present, it should still be applied. It could make a big difference if
the selectivity for the nulltest is high.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13David Rowley
david.rowley@2ndquadrant.com
In reply to: Claudio Freire (#12)
Re: Making clausesel.c Smarter

On 4 April 2017 at 11:35, Claudio Freire <klaussfreire@gmail.com> wrote:

I'd prefer it if all occurrences of the concept were changed, to
maintain consistency.
That would include variables used to hold expressions that refer to
these as well, as in the case of:

+                Node       *var;
+
+                if (varonleft)
+                    var = leftexpr;
+                else
+                    var = rightexpr;
+

Thanks for looking again.
I didn't change that one as there's already a variable named "expr" in
the scope. I thought changing that would mean code churn that I don't
really want to add to the patch. Someone else might come along and ask
me why I'm renaming this unrelated variable. I kinda of think that if
it was var before when it meant expr, then it's not the business of
this patch to clean that up. I didn't rename the struct member, for
example, as the meaning is no different than before.

One last observation:

+        /*
+         * An IS NOT NULL test is a no-op if there's any other strict quals,
+         * so if that's the case, then we'll only apply this, otherwise we'll
+         * ignore it.
+         */
+        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
+            s1 *= cslist->notnullsel;

In some paths, the range selectivity estimation code punts and returns
DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
present, it should still be applied. It could make a big difference if
the selectivity for the nulltest is high.

I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
to exists to estimate that properly. I don't think that needs done as
part of this patch. I'd rather limit the scope since "returned with
feedback" is already knocking at the door of this patch.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Claudio Freire
klaussfreire@gmail.com
In reply to: David Rowley (#13)
Re: Making clausesel.c Smarter

On Mon, Apr 3, 2017 at 8:52 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

One last observation:

+        /*
+         * An IS NOT NULL test is a no-op if there's any other strict quals,
+         * so if that's the case, then we'll only apply this, otherwise we'll
+         * ignore it.
+         */
+        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
+            s1 *= cslist->notnullsel;

In some paths, the range selectivity estimation code punts and returns
DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
present, it should still be applied. It could make a big difference if
the selectivity for the nulltest is high.

I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
to exists to estimate that properly. I don't think that needs done as
part of this patch. I'd rather limit the scope since "returned with
feedback" is already knocking at the door of this patch.

I agree, but this patch regresses for those cases if the user
compensated with a not null test, leaving little recourse, so I'd fix
it in this patch to avoid the regression.

Maybe you're right that the null fraction should be applied regardless
of whether there's an explicit null check, though.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Claudio Freire
klaussfreire@gmail.com
In reply to: Claudio Freire (#14)
Re: Making clausesel.c Smarter

On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire <klaussfreire@gmail.com> wrote:

On Mon, Apr 3, 2017 at 8:52 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

One last observation:

+        /*
+         * An IS NOT NULL test is a no-op if there's any other strict quals,
+         * so if that's the case, then we'll only apply this, otherwise we'll
+         * ignore it.
+         */
+        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
+            s1 *= cslist->notnullsel;

In some paths, the range selectivity estimation code punts and returns
DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
present, it should still be applied. It could make a big difference if
the selectivity for the nulltest is high.

I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
to exists to estimate that properly. I don't think that needs done as
part of this patch. I'd rather limit the scope since "returned with
feedback" is already knocking at the door of this patch.

I agree, but this patch regresses for those cases if the user
compensated with a not null test, leaving little recourse, so I'd fix
it in this patch to avoid the regression.

Maybe you're right that the null fraction should be applied regardless
of whether there's an explicit null check, though.

The more I read that code, the more I'm convinced there's no way to
get a DEFAULT_RANGE_INEQ_SEL if (rqlist->hibound != DEFAULT_INEQ_SEL
&&
rqlist->lobound != DEFAULT_INEQ_SEL) other than from contradictory
clauses, a case the current code handles very poorly, returning
DEFAULT_RANGE_INEQ_SEL and ignoring nullfrac, something that could
give way-off estimates if the table had lots of nulls.

Say, consider if "value" was mostly null and you write:

SELECT * FROM testdata WHERE value BETWEEN 10 AND 20 AND value BETWEEN
50 AND 60;

All other cases I can think of would end with either hibound or
lobound being equal to DEFAULT_INEQ_SEL.

It seems to me, doing a git blame, that the checks in the else branch
were left only as a precaution, one that seems unnecessary.

If you ask me, I'd just leave:

+ if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
DEFAULT_INEQ_SEL)
+ {
+     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
implies we have no stats */
+     s2 = DEFAULT_RANGE_INEQ_SEL;
+ }
+ else
+ {
...
+   s2 = rqlist->hibound + rqlist->lobound - 1.0;
+   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
+   notnullsel = 1.0 - nullsel;
+
+   /* Adjust for double-exclusion of NULLs */
+   s2 += nullsel;
+   if (s2 <= 0.0) {
+      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
+      {
+           /* Most likely contradictory clauses found */
+           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
+      }
+      else
+      {
+           /* Could be a rounding error */
+           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
+      }
+   }
+ }

Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
cautious) estimation of the amount of rounding error that could be
there with 32-bit floats.

The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
an estimation based on stats, maybe approximate, but one that makes
sense (ie: preserves the monotonicity of the CPF), and as such
negative values are either sign of a contradiction or rounding error.
The previous code left the possibility of negatives coming out of
default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
but that doesn't seem possible coming out of scalarineqsel.

But I'd like it if Tom could comment on that, since he's the one that
did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
in 2004).

Barring that, I'd just change the

s2 = DEFAULT_RANGE_INEQ_SEL;

To

s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;

Which makes a lot more sense.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#6)
Re: Making clausesel.c Smarter

On 3 April 2017 at 20:59, David Rowley <david.rowley@2ndquadrant.com> wrote:

Updated patch attached.

I did a little benchmarking on this to learn how planning time is affected.

Master = 9fa6e08d4a16f9b0461743cff35781e16308c106
Patched = 9fa6e08d4a16f9b0461743cff35781e16308c106 +
smarter_clausesel_2017-04-03.patch

Config:

All standard

Setup:
create table abcd (a int, b int, c int, d int, arr int[]);

Tests:

Test1: explain select * from abcd where a = 1 and b = 1 and c = 1 and d = 1;
Test2: explain select * from abcd where a >= 1 and a <= 10 and b = 1 and c = 1;
Test3: explain select * from abcd a1 inner join abcd a2 on a1.a = a2.a
where a1.b = 1 and a2.b = 1 and a1.c = 1 and a2.c = 1;
Test4: (output of) select 'explain select * from abcd where ' ||
string_agg('arr[' || x::Text || '] = 1',' AND ') from
generate_series(0,999) x;
Test5: (output of) select 'explain select * from abcd where ' ||
string_agg('arr[' || x::Text || '] > 1',' AND ') from
generate_series(0,999) x;
Test6: explain select * from abcd;

Tests were performed running pgbench -T 60 -n

Raw Results:

Master

Test 1
tps = 6993.677785 (excluding connections establishing)
tps = 7072.796544 (excluding connections establishing)
tps = 6877.428676 (excluding connections establishing)

Test2
tps = 6698.456888 (excluding connections establishing)
tps = 7117.481643 (excluding connections establishing)
tps = 7053.070973 (excluding connections establishing)

Test 3
tps = 5137.229119 (excluding connections establishing)
tps = 5091.548602 (excluding connections establishing)
tps = 5175.683112 (excluding connections establishing)

Test 4
tps = 23.816356 (excluding connections establishing)
tps = 27.069840 (excluding connections establishing)
tps = 27.398995 (excluding connections establishing)

Test 5
tps = 54.209078 (excluding connections establishing)
tps = 54.022850 (excluding connections establishing)
tps = 54.076590 (excluding connections establishing)

Test 6
tps = 9328.134781 (excluding connections establishing)
tps = 9181.898271 (excluding connections establishing)
tps = 9402.993637 (excluding connections establishing)

Patched

Test 1
tps = 6560.160644 (excluding connections establishing)
tps = 6714.294344 (excluding connections establishing)
tps = 6901.381552 (excluding connections establishing)

Test 2
tps = 6971.106747 (excluding connections establishing)
tps = 6591.363565 (excluding connections establishing)
tps = 6921.572060 (excluding connections establishing)

Test 3
tps = 4784.606871 (excluding connections establishing)
tps = 4980.609259 (excluding connections establishing)
tps = 4954.237649 (excluding connections establishing)

Test 4
tps = 19.374184 (excluding connections establishing)
tps = 19.836406 (excluding connections establishing)
tps = 19.047484 (excluding connections establishing)

(perf of test 4)
+   40.20%    40.20%  postgres  postgres            [.] equal.part.18
+   29.57%     0.00%  postgres  [unknown]           [.] 0xffffffff00000017
+   25.15%     0.00%  postgres  [unknown]           [.] 0x00000000ffffffff
+   20.29%     0.00%  postgres  [unknown]           [k] 0000000000000000
+   12.36%    12.36%  postgres  postgres            [.] _equalList
+   11.25%     0.00%  postgres  [unknown]           [.] 0x0000558ed98fb148
+    8.53%     8.53%  postgres  postgres            [.] _equalArrayRef
+    6.22%     6.22%  postgres  postgres            [.] process_equivalence
     5.93%     5.93%  postgres  postgres            [.] get_eclass_for_sort_expr
+    2.75%     0.00%  postgres  [kernel.kallsyms]   [k] 0xffffffffaf06b772

Test 5
tps = 51.691502 (excluding connections establishing)
tps = 51.296724 (excluding connections establishing)
tps = 51.366643 (excluding connections establishing)

Test 6
tps = 9528.363239 (excluding connections establishing)
tps = 9478.202725 (excluding connections establishing)
tps = 9346.753395 (excluding connections establishing)

Result Comparison

Master median tps Patch median tps comparison
Test 1 6993.7 6714.3 104.16%
Test 2 7053.1 6921.6 101.90%
Test 3 5137.2 4954.2 103.69%
Test 4 27.1 19.4 139.72%
Test 5 54.1 51.4 105.28%
Test 6 9328.1 9478.2 98.42%

Results Analyzed:

Test 1 has caused planning to slow down 4.16%. There's quite a bit of
noise from the results, but I think this indicates there is some
overhead to having to add items to the cslist and searching the cslist
when new quals are seen.

Test 2 has a lesser slowdown than test 1, as this test will excercise
the existing rqlist caching in master too. Patched does a little more
work adding the equality condition to the list too.

Test 3 similar to test 1

Test 4 adds quite an overhead and causes 0.5 million comparisons to
find the expressions in the cslist.

Test 5 shows less overhead than test 4 since the Master code has to
also do the expression caching and searching.

Test 6 is a control test

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17David Rowley
david.rowley@2ndquadrant.com
In reply to: Claudio Freire (#15)
Re: Making clausesel.c Smarter

On 4 April 2017 at 13:35, Claudio Freire <klaussfreire@gmail.com> wrote:

On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire <klaussfreire@gmail.com> wrote:

On Mon, Apr 3, 2017 at 8:52 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

One last observation:

+        /*
+         * An IS NOT NULL test is a no-op if there's any other strict quals,
+         * so if that's the case, then we'll only apply this, otherwise we'll
+         * ignore it.
+         */
+        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
+            s1 *= cslist->notnullsel;

In some paths, the range selectivity estimation code punts and returns
DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
present, it should still be applied. It could make a big difference if
the selectivity for the nulltest is high.

I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
to exists to estimate that properly. I don't think that needs done as
part of this patch. I'd rather limit the scope since "returned with
feedback" is already knocking at the door of this patch.

I agree, but this patch regresses for those cases if the user
compensated with a not null test, leaving little recourse, so I'd fix
it in this patch to avoid the regression.

Maybe you're right that the null fraction should be applied regardless
of whether there's an explicit null check, though.

The more I read that code, the more I'm convinced there's no way to
get a DEFAULT_RANGE_INEQ_SEL if (rqlist->hibound != DEFAULT_INEQ_SEL
&&
rqlist->lobound != DEFAULT_INEQ_SEL) other than from contradictory
clauses, a case the current code handles very poorly, returning
DEFAULT_RANGE_INEQ_SEL and ignoring nullfrac, something that could
give way-off estimates if the table had lots of nulls.

Say, consider if "value" was mostly null and you write:

SELECT * FROM testdata WHERE value BETWEEN 10 AND 20 AND value BETWEEN
50 AND 60;

All other cases I can think of would end with either hibound or
lobound being equal to DEFAULT_INEQ_SEL.

It seems to me, doing a git blame, that the checks in the else branch
were left only as a precaution, one that seems unnecessary.

If you ask me, I'd just leave:

+ if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
DEFAULT_INEQ_SEL)
+ {
+     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
implies we have no stats */
+     s2 = DEFAULT_RANGE_INEQ_SEL;
+ }
+ else
+ {
...
+   s2 = rqlist->hibound + rqlist->lobound - 1.0;
+   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
+   notnullsel = 1.0 - nullsel;
+
+   /* Adjust for double-exclusion of NULLs */
+   s2 += nullsel;
+   if (s2 <= 0.0) {
+      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
+      {
+           /* Most likely contradictory clauses found */
+           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
+      }
+      else
+      {
+           /* Could be a rounding error */
+           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
+      }
+   }
+ }

Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
cautious) estimation of the amount of rounding error that could be
there with 32-bit floats.

The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
an estimation based on stats, maybe approximate, but one that makes
sense (ie: preserves the monotonicity of the CPF), and as such
negative values are either sign of a contradiction or rounding error.
The previous code left the possibility of negatives coming out of
default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
but that doesn't seem possible coming out of scalarineqsel.

But I'd like it if Tom could comment on that, since he's the one that
did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
in 2004).

Barring that, I'd just change the

s2 = DEFAULT_RANGE_INEQ_SEL;

To

s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;

Which makes a lot more sense.

I think to fix this properly the selfuncs would have to return the
estimate, and nullfrac separately, so the nullfrac could just be
applied once per expression. There's already hacks in there to get rid
of the double null filtering. I'm not proposing that as something for
this patch. It would be pretty invasive to change this.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Claudio Freire
klaussfreire@gmail.com
In reply to: David Rowley (#16)
Re: Making clausesel.c Smarter

On Tue, Apr 4, 2017 at 8:12 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

Result Comparison

Master median tps Patch median tps comparison
Test 1 6993.7 6714.3 104.16%
Test 2 7053.1 6921.6 101.90%
Test 3 5137.2 4954.2 103.69%
Test 4 27.1 19.4 139.72%
Test 5 54.1 51.4 105.28%
Test 6 9328.1 9478.2 98.42%

Results Analyzed:

Test 1 has caused planning to slow down 4.16%. There's quite a bit of
noise from the results, but I think this indicates there is some
overhead to having to add items to the cslist and searching the cslist
when new quals are seen.

Test 2 has a lesser slowdown than test 1, as this test will excercise
the existing rqlist caching in master too. Patched does a little more
work adding the equality condition to the list too.

Test 3 similar to test 1

Test 4 adds quite an overhead and causes 0.5 million comparisons to
find the expressions in the cslist.

Test 5 shows less overhead than test 4 since the Master code has to
also do the expression caching and searching.

Test 6 is a control test

That's consistent with the operation being quadratic.

While I was about to point it out, the old code was quadratic too, so
it sucks in both versions. This version only adds other opexprs to the
list and makes the quadratic cost easier to run into, but it's nothing
new.

I don't think there's an easy fix for that. You'd have to build a hash
table of expression nodes or something of the sort to avoid the
quadratic cost. If you can do that, by all means, but that's a much
bigger patch.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Claudio Freire
klaussfreire@gmail.com
In reply to: David Rowley (#17)
Re: Making clausesel.c Smarter

On Tue, Apr 4, 2017 at 8:21 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

On 4 April 2017 at 13:35, Claudio Freire <klaussfreire@gmail.com> wrote:

On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire <klaussfreire@gmail.com> wrote:

On Mon, Apr 3, 2017 at 8:52 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

One last observation:

+        /*
+         * An IS NOT NULL test is a no-op if there's any other strict quals,
+         * so if that's the case, then we'll only apply this, otherwise we'll
+         * ignore it.
+         */
+        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
+            s1 *= cslist->notnullsel;

In some paths, the range selectivity estimation code punts and returns
DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
present, it should still be applied. It could make a big difference if
the selectivity for the nulltest is high.

I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
to exists to estimate that properly. I don't think that needs done as
part of this patch. I'd rather limit the scope since "returned with
feedback" is already knocking at the door of this patch.

I agree, but this patch regresses for those cases if the user
compensated with a not null test, leaving little recourse, so I'd fix
it in this patch to avoid the regression.

Maybe you're right that the null fraction should be applied regardless
of whether there's an explicit null check, though.

The more I read that code, the more I'm convinced there's no way to
get a DEFAULT_RANGE_INEQ_SEL if (rqlist->hibound != DEFAULT_INEQ_SEL
&&
rqlist->lobound != DEFAULT_INEQ_SEL) other than from contradictory
clauses, a case the current code handles very poorly, returning
DEFAULT_RANGE_INEQ_SEL and ignoring nullfrac, something that could
give way-off estimates if the table had lots of nulls.

Say, consider if "value" was mostly null and you write:

SELECT * FROM testdata WHERE value BETWEEN 10 AND 20 AND value BETWEEN
50 AND 60;

All other cases I can think of would end with either hibound or
lobound being equal to DEFAULT_INEQ_SEL.

It seems to me, doing a git blame, that the checks in the else branch
were left only as a precaution, one that seems unnecessary.

If you ask me, I'd just leave:

+ if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
DEFAULT_INEQ_SEL)
+ {
+     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
implies we have no stats */
+     s2 = DEFAULT_RANGE_INEQ_SEL;
+ }
+ else
+ {
...
+   s2 = rqlist->hibound + rqlist->lobound - 1.0;
+   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
+   notnullsel = 1.0 - nullsel;
+
+   /* Adjust for double-exclusion of NULLs */
+   s2 += nullsel;
+   if (s2 <= 0.0) {
+      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
+      {
+           /* Most likely contradictory clauses found */
+           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
+      }
+      else
+      {
+           /* Could be a rounding error */
+           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
+      }
+   }
+ }

Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
cautious) estimation of the amount of rounding error that could be
there with 32-bit floats.

The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
an estimation based on stats, maybe approximate, but one that makes
sense (ie: preserves the monotonicity of the CPF), and as such
negative values are either sign of a contradiction or rounding error.
The previous code left the possibility of negatives coming out of
default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
but that doesn't seem possible coming out of scalarineqsel.

But I'd like it if Tom could comment on that, since he's the one that
did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
in 2004).

Barring that, I'd just change the

s2 = DEFAULT_RANGE_INEQ_SEL;

To

s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;

Which makes a lot more sense.

I think to fix this properly the selfuncs would have to return the
estimate, and nullfrac separately, so the nullfrac could just be
applied once per expression. There's already hacks in there to get rid
of the double null filtering. I'm not proposing that as something for
this patch. It would be pretty invasive to change this.

There's no need, you can compute the nullfrac with nulltestsel. While
there's some rework involved, it's not a big deal (it just reads the
stats tuple), and it's a clean solution.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Claudio Freire
klaussfreire@gmail.com
In reply to: Claudio Freire (#19)
Re: Making clausesel.c Smarter

On Tue, Apr 4, 2017 at 1:00 PM, Claudio Freire <klaussfreire@gmail.com> wrote:

If you ask me, I'd just leave:

+ if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
DEFAULT_INEQ_SEL)
+ {
+     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
implies we have no stats */
+     s2 = DEFAULT_RANGE_INEQ_SEL;
+ }
+ else
+ {
...
+   s2 = rqlist->hibound + rqlist->lobound - 1.0;
+   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
+   notnullsel = 1.0 - nullsel;
+
+   /* Adjust for double-exclusion of NULLs */
+   s2 += nullsel;
+   if (s2 <= 0.0) {
+      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
+      {
+           /* Most likely contradictory clauses found */
+           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
+      }
+      else
+      {
+           /* Could be a rounding error */
+           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
+      }
+   }
+ }

Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
cautious) estimation of the amount of rounding error that could be
there with 32-bit floats.

The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
an estimation based on stats, maybe approximate, but one that makes
sense (ie: preserves the monotonicity of the CPF), and as such
negative values are either sign of a contradiction or rounding error.
The previous code left the possibility of negatives coming out of
default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
but that doesn't seem possible coming out of scalarineqsel.

But I'd like it if Tom could comment on that, since he's the one that
did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
in 2004).

Barring that, I'd just change the

s2 = DEFAULT_RANGE_INEQ_SEL;

To

s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;

Which makes a lot more sense.

I think to fix this properly the selfuncs would have to return the
estimate, and nullfrac separately, so the nullfrac could just be
applied once per expression. There's already hacks in there to get rid
of the double null filtering. I'm not proposing that as something for
this patch. It would be pretty invasive to change this.

There's no need, you can compute the nullfrac with nulltestsel. While
there's some rework involved, it's not a big deal (it just reads the
stats tuple), and it's a clean solution.

I'm marking this Waiting on Author until we figure out what to do with
those issues (the null issue quoted above, and the quadratic behavior)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21David Rowley
david.rowley@2ndquadrant.com
In reply to: Claudio Freire (#20)
1 attachment(s)
Re: Making clausesel.c Smarter

On 7 April 2017 at 09:05, Claudio Freire <klaussfreire@gmail.com> wrote:

On Tue, Apr 4, 2017 at 1:00 PM, Claudio Freire <klaussfreire@gmail.com> wrote:

If you ask me, I'd just leave:

+ if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
DEFAULT_INEQ_SEL)
+ {
+     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
implies we have no stats */
+     s2 = DEFAULT_RANGE_INEQ_SEL;
+ }
+ else
+ {
...
+   s2 = rqlist->hibound + rqlist->lobound - 1.0;
+   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
+   notnullsel = 1.0 - nullsel;
+
+   /* Adjust for double-exclusion of NULLs */
+   s2 += nullsel;
+   if (s2 <= 0.0) {
+      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
+      {
+           /* Most likely contradictory clauses found */
+           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
+      }
+      else
+      {
+           /* Could be a rounding error */
+           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
+      }
+   }
+ }

Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
cautious) estimation of the amount of rounding error that could be
there with 32-bit floats.

The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
an estimation based on stats, maybe approximate, but one that makes
sense (ie: preserves the monotonicity of the CPF), and as such
negative values are either sign of a contradiction or rounding error.
The previous code left the possibility of negatives coming out of
default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
but that doesn't seem possible coming out of scalarineqsel.

But I'd like it if Tom could comment on that, since he's the one that
did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
in 2004).

Barring that, I'd just change the

s2 = DEFAULT_RANGE_INEQ_SEL;

To

s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;

Which makes a lot more sense.

I think to fix this properly the selfuncs would have to return the
estimate, and nullfrac separately, so the nullfrac could just be
applied once per expression. There's already hacks in there to get rid
of the double null filtering. I'm not proposing that as something for
this patch. It would be pretty invasive to change this.

There's no need, you can compute the nullfrac with nulltestsel. While
there's some rework involved, it's not a big deal (it just reads the
stats tuple), and it's a clean solution.

I'm marking this Waiting on Author until we figure out what to do with
those issues (the null issue quoted above, and the quadratic behavior)

I've attached a rebased patch as the existing one no longer applies.

I've not yet had time to wrap my head around the null handling stuff.
I'll try to get to that today.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

smarter_clausesel_2017-04-07.patchapplication/octet-stream; name=smarter_clausesel_2017-04-07.patchDownload
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 614c1d1..3af5b7c 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -24,23 +24,39 @@
 #include "utils/selfuncs.h"
 #include "statistics/statistics.h"
 
+#define CACHESEL_LOBOUND		0x0001	/* has var > something selectivity */
+#define CACHESEL_HIBOUND		0x0002	/* has var < something selectivity */
+#define CACHESEL_NULLTEST		0x0004	/* has var IS NULL selectivity */
+#define CACHESEL_NOTNULLTEST	0x0008	/* has var IS NOT NULL selectivity */
+#define CACHESEL_OTHERSTRICT	0x0010	/* has another OpExpr selectivity */
 
 /*
- * Data structure for accumulating info about possible range-query
- * clause pairs in clauselist_selectivity.
+ * Data structure for caching selectivity for clauselist_selectivity.
  */
-typedef struct RangeQueryClause
+typedef struct CachedSelectivityClause
 {
-	struct RangeQueryClause *next;		/* next in linked list */
+	struct CachedSelectivityClause *next;		/* next in linked list */
 	Node	   *var;			/* The common variable of the clauses */
-	bool		have_lobound;	/* found a low-bound clause yet? */
-	bool		have_hibound;	/* found a high-bound clause yet? */
+	int			selmask;		/* Bitmask of which sel types are stored */
 	Selectivity lobound;		/* Selectivity of a var > something clause */
 	Selectivity hibound;		/* Selectivity of a var < something clause */
-} RangeQueryClause;
-
-static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
-			   bool varonleft, bool isLTsel, Selectivity s2);
+	Selectivity nullsel;		/* Selectivity of a IS NULL test */
+	Selectivity notnullsel;		/* Selectivity of a IS NOT NULL test */
+	Selectivity otherstrictsel; /* Selectivity of any other strict clauses */
+} CachedSelectivityClause;
+
+
+static CachedSelectivityClause *findCachedSelectivityVar(
+						 CachedSelectivityClause **cslist, Node *expr);
+static void addCachedSelectivityNullTest(CachedSelectivityClause **cslist,
+							 Node *expr, Selectivity s2);
+static void addCachedSelectivityNotNullTest(CachedSelectivityClause **cslist,
+								Node *expr, Selectivity s2);
+static void addCachedSelectivityRangeVar(CachedSelectivityClause **cslist,
+				   Node *expr, bool varonleft, bool isLTsel, Selectivity s2);
+static void addCachedSelectivityOtherStrictClause(
+									  CachedSelectivityClause **cslist,
+									  Node *expr, Selectivity s2);
 static RelOptInfo *find_relation_from_clauses(PlannerInfo *root,
 											  List *clauses);
 
@@ -85,11 +101,22 @@ static RelOptInfo *find_relation_from_clauses(PlannerInfo *root,
  * hisel can be interpreted directly as a 0..1 value but we need to convert
  * losel to 1-losel before interpreting it as a value.  Then the available
  * range is 1-losel to hisel.  However, this calculation double-excludes
- * nulls, so really we need hisel + losel + null_frac - 1.)
+ * nulls, so really we need hisel + losel + null_frac - 1.).
+ *
+ * IS [NOT] NULL conditions are also handled in a special way. We attempt to
+ * cache all quals on a given expression and only apply the selectivity of an
+ * IS [NOT] NULL condition if no other qual on the same expression would
+ * have already accounted for nulls be filtered, for example;
+ *
+ * WHERE x = 1 AND x IS NOT NULL;
+ *
+ * the x = 1 will have already included filtering the NULL values during its
+ * selectivity estimate, if we went and applied the x IS NOT NULL selectivity
+ * again, then we'd end up underestimating the selectivity over both quals.
  *
- * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
- * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
- * yields an impossible (negative) result.
+ * For range pairs, if either selectivity is exactly DEFAULT_INEQ_SEL, we
+ * forget this equation and instead use DEFAULT_RANGE_INEQ_SEL.  The same
+ * applies if the equation yields an impossible (negative) result.
  *
  * A free side-effect is that we can recognize redundant inequalities such
  * as "x < 4 AND x < 5"; only the tighter constraint will be counted.
@@ -105,7 +132,7 @@ clauselist_selectivity(PlannerInfo *root,
 					   SpecialJoinInfo *sjinfo)
 {
 	Selectivity s1 = 1.0;
-	RangeQueryClause *rqlist = NULL;
+	CachedSelectivityClause *cslist = NULL;
 	ListCell   *l;
 	Bitmapset  *estimatedclauses = NULL;
 	int			listidx;
@@ -193,6 +220,20 @@ clauselist_selectivity(PlannerInfo *root,
 		else
 			rinfo = NULL;
 
+		if (IsA(clause, NullTest))
+		{
+			NullTestType nulltesttype = ((NullTest *) clause)->nulltesttype;
+
+			if (nulltesttype == IS_NULL)
+				addCachedSelectivityNullTest(&cslist,
+									(Node *) ((NullTest *) clause)->arg, s2);
+			else if (nulltesttype == IS_NOT_NULL)
+				addCachedSelectivityNotNullTest(&cslist,
+									(Node *) ((NullTest *) clause)->arg, s2);
+
+			continue;			/* drop to loop bottom */
+		}
+
 		/*
 		 * See if it looks like a restriction clause with a pseudoconstant on
 		 * one side.  (Anything more complicated than that might not behave in
@@ -204,26 +245,35 @@ clauselist_selectivity(PlannerInfo *root,
 			OpExpr	   *expr = (OpExpr *) clause;
 			bool		varonleft = true;
 			bool		ok;
+			Node	   *leftexpr = (Node *) linitial(expr->args);
+			Node	   *rightexpr = (Node *) lsecond(expr->args);
 
 			if (rinfo)
 			{
 				ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
-					(is_pseudo_constant_clause_relids(lsecond(expr->args),
+					(is_pseudo_constant_clause_relids(rightexpr,
 													  rinfo->right_relids) ||
 					 (varonleft = false,
-					  is_pseudo_constant_clause_relids(linitial(expr->args),
+					  is_pseudo_constant_clause_relids(leftexpr,
 													   rinfo->left_relids)));
 			}
 			else
 			{
 				ok = (NumRelids(clause) == 1) &&
-					(is_pseudo_constant_clause(lsecond(expr->args)) ||
+					(is_pseudo_constant_clause(rightexpr) ||
 					 (varonleft = false,
-					  is_pseudo_constant_clause(linitial(expr->args))));
+					  is_pseudo_constant_clause(leftexpr)));
 			}
 
 			if (ok)
 			{
+				Node	   *var;
+
+				if (varonleft)
+					var = leftexpr;
+				else
+					var = rightexpr;
+
 				/*
 				 * If it's not a "<" or ">" operator, just merge the
 				 * selectivity in generically.  But if it's the right oprrest,
@@ -232,16 +282,25 @@ clauselist_selectivity(PlannerInfo *root,
 				switch (get_oprrest(expr->opno))
 				{
 					case F_SCALARLTSEL:
-						addRangeClause(&rqlist, clause,
-									   varonleft, true, s2);
+						addCachedSelectivityRangeVar(&cslist, var,
+													 varonleft, true, s2);
 						break;
 					case F_SCALARGTSEL:
-						addRangeClause(&rqlist, clause,
-									   varonleft, false, s2);
+						addCachedSelectivityRangeVar(&cslist, var,
+													 varonleft, false, s2);
 						break;
 					default:
-						/* Just merge the selectivity in generically */
-						s1 = s1 * s2;
+
+						/*
+						 * Cache all strict clauses in selectivity other.
+						 * Anything non-strict we'll just apply the
+						 * selectivity now, since we're currently unable to do
+						 * anything particularly smart with it.
+						 */
+						if (op_strict(expr->opno))
+							addCachedSelectivityOtherStrictClause(&cslist, var, s2);
+						else
+							s1 = s1 * s2;
 						break;
 				}
 				continue;		/* drop to loop bottom */
@@ -253,176 +312,259 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
-	 * Now scan the rangequery pair list.
+	 * Now scan the cached selectivity list
 	 */
-	while (rqlist != NULL)
+	while (cslist != NULL)
 	{
-		RangeQueryClause *rqnext;
+		CachedSelectivityClause *csnext;
 
-		if (rqlist->have_lobound && rqlist->have_hibound)
+		if ((cslist->selmask & CACHESEL_NULLTEST))
 		{
-			/* Successfully matched a pair of range clauses */
-			Selectivity s2;
-
 			/*
-			 * Exact equality to the default value probably means the
-			 * selectivity function punted.  This is not airtight but should
-			 * be good enough.
+			 * if null test is not the only flag then there can be no matching
+			 * rows at all.
 			 */
-			if (rqlist->hibound == DEFAULT_INEQ_SEL ||
-				rqlist->lobound == DEFAULT_INEQ_SEL)
+			if (cslist->selmask != CACHESEL_NULLTEST)
 			{
-				s2 = DEFAULT_RANGE_INEQ_SEL;
+				s1 = 0;
+				break;			/* nothing more needs estimated */
 			}
 			else
-			{
-				s2 = rqlist->hibound + rqlist->lobound - 1.0;
+				s1 *= cslist->nullsel;
+		}
 
-				/* Adjust for double-exclusion of NULLs */
-				s2 += nulltestsel(root, IS_NULL, rqlist->var,
-								  varRelid, jointype, sjinfo);
+		/*
+		 * An IS NOT NULL test is a no-op if there's any other strict quals,
+		 * so if that's the case, then we'll only apply this, otherwise we'll
+		 * ignore it.
+		 */
+		else if (cslist->selmask == CACHESEL_NOTNULLTEST)
+			s1 *= cslist->notnullsel;
+
+		else
+		{
+			/* Check if both lobound and hibound were seen */
+			if ((cslist->selmask & (CACHESEL_LOBOUND | CACHESEL_HIBOUND)) ==
+				(CACHESEL_LOBOUND | CACHESEL_HIBOUND))
+			{
+				/* Successfully matched a pair of range clauses */
+				Selectivity s2;
 
 				/*
-				 * A zero or slightly negative s2 should be converted into a
-				 * small positive value; we probably are dealing with a very
-				 * tight range and got a bogus result due to roundoff errors.
-				 * However, if s2 is very negative, then we probably have
-				 * default selectivity estimates on one or both sides of the
-				 * range that we failed to recognize above for some reason.
+				 * Exact equality to the default value probably means the
+				 * selectivity function punted.  This is not airtight but
+				 * should be good enough.
 				 */
-				if (s2 <= 0.0)
+				if (cslist->hibound == DEFAULT_INEQ_SEL ||
+					cslist->lobound == DEFAULT_INEQ_SEL)
 				{
-					if (s2 < -0.01)
-					{
-						/*
-						 * No data available --- use a default estimate that
-						 * is small, but not real small.
-						 */
-						s2 = DEFAULT_RANGE_INEQ_SEL;
-					}
-					else
+					s2 = DEFAULT_RANGE_INEQ_SEL;
+				}
+				else
+				{
+					s2 = cslist->hibound + cslist->lobound - 1.0;
+
+					/* Adjust for double-exclusion of NULLs */
+					s2 += nulltestsel(root, IS_NULL, cslist->var,
+									  varRelid, jointype, sjinfo);
+
+					/*
+					 * A zero or slightly negative s2 should be converted into
+					 * a small positive value; we probably are dealing with a
+					 * very tight range and got a bogus result due to roundoff
+					 * errors. However, if s2 is very negative, then we
+					 * probably have default selectivity estimates on one or
+					 * both sides of the range that we failed to recognize
+					 * above for some reason.
+					 */
+					if (s2 <= 0.0)
 					{
-						/*
-						 * It's just roundoff error; use a small positive
-						 * value
-						 */
-						s2 = 1.0e-10;
+						if (s2 < -0.01)
+						{
+							/*
+							 * No data available --- use a default estimate
+							 * that is small, but not real small.
+							 */
+							s2 = DEFAULT_RANGE_INEQ_SEL;
+						}
+						else
+						{
+							/*
+							 * It's just roundoff error; use a small positive
+							 * value
+							 */
+							s2 = 1.0e-10;
+						}
 					}
 				}
+				/* Merge in the selectivity of the pair of clauses */
+				s1 *= s2;
 			}
-			/* Merge in the selectivity of the pair of clauses */
-			s1 *= s2;
-		}
-		else
-		{
-			/* Only found one of a pair, merge it in generically */
-			if (rqlist->have_lobound)
-				s1 *= rqlist->lobound;
 			else
-				s1 *= rqlist->hibound;
+			{
+				/* Only found one of a range pair, merge it in generically */
+				if ((cslist->selmask & CACHESEL_LOBOUND))
+					s1 *= cslist->lobound;
+				else if ((cslist->selmask & CACHESEL_HIBOUND))
+					s1 *= cslist->hibound;
+			}
+
+			/* apply the selectivity for any other seen strict qual */
+			if ((cslist->selmask & CACHESEL_OTHERSTRICT))
+				s1 *= cslist->otherstrictsel;
 		}
+
 		/* release storage and advance */
-		rqnext = rqlist->next;
-		pfree(rqlist);
-		rqlist = rqnext;
+		csnext = cslist->next;
+		pfree(cslist);
+		cslist = csnext;
 	}
 
 	return s1;
 }
 
 /*
- * addRangeClause --- add a new range clause for clauselist_selectivity
+ * findCachedSelectivityVar
+ *	Find existing seletivity var, or add this var to the list.
+ */
+static CachedSelectivityClause *
+findCachedSelectivityVar(CachedSelectivityClause **cslist, Node *expr)
+{
+	CachedSelectivityClause *cselem;
+
+	for (cselem = *cslist; cselem; cselem = cselem->next)
+	{
+		/*
+		 * We use full equal() here because the "var" might be a function of
+		 * one or more attributes of the same relation...
+		 */
+		if (equal(expr, cselem->var))
+			return cselem;
+	}
+
+	/* not found -- add it */
+	cselem = (CachedSelectivityClause *) palloc(sizeof(CachedSelectivityClause));
+	cselem->var = expr;
+	cselem->selmask = 0;
+
+	cselem->next = *cslist;
+	*cslist = cselem;
+	return cselem;
+}
+
+
+/*
+ * addCachedSelectivityNullTest
+ *		Cache selectivity for an IS NULL test.
+ */
+static void
+addCachedSelectivityNullTest(CachedSelectivityClause **cslist, Node *expr,
+							 Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	/* We can simply overwrite any previously cached selectivity here */
+	cselem->nullsel = s2;
+	cselem->selmask |= CACHESEL_NULLTEST;
+}
+
+/*
+ * addCachedSelectivityNotNullTest
+ *		Cache selectivity for an IS NOT NULL test.
+ */
+static void
+addCachedSelectivityNotNullTest(CachedSelectivityClause **cslist, Node *expr,
+								Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	/* We can simply overwrite any previously cached selectivity here */
+	cselem->notnullsel = s2;
+	cselem->selmask |= CACHESEL_NOTNULLTEST;
+}
+
+/*
+ * addCachedSelectivityRangeVar
+ *		add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
  */
 static void
-addRangeClause(RangeQueryClause **rqlist, Node *clause,
-			   bool varonleft, bool isLTsel, Selectivity s2)
+addCachedSelectivityRangeVar(CachedSelectivityClause **cslist, Node *expr,
+							 bool varonleft, bool isLTsel, Selectivity s2)
 {
-	RangeQueryClause *rqelem;
-	Node	   *var;
+	CachedSelectivityClause *cselem;
 	bool		is_lobound;
 
-	if (varonleft)
+	is_lobound = (varonleft != isLTsel);
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	if (is_lobound)
 	{
-		var = get_leftop((Expr *) clause);
-		is_lobound = !isLTsel;	/* x < something is high bound */
+		if (!(cselem->selmask & CACHESEL_LOBOUND))
+		{
+			cselem->selmask |= CACHESEL_LOBOUND;
+			cselem->lobound = s2;
+		}
+		else
+		{
+			/*------
+			 * We have found two similar clauses, such as
+			 * x < y AND x < z.
+			 * Keep only the more restrictive one.
+			 *------
+			 */
+			if (cselem->lobound > s2)
+				cselem->lobound = s2;
+		}
 	}
 	else
 	{
-		var = get_rightop((Expr *) clause);
-		is_lobound = isLTsel;	/* something < x is low bound */
-	}
-
-	for (rqelem = *rqlist; rqelem; rqelem = rqelem->next)
-	{
-		/*
-		 * We use full equal() here because the "var" might be a function of
-		 * one or more attributes of the same relation...
-		 */
-		if (!equal(var, rqelem->var))
-			continue;
-		/* Found the right group to put this clause in */
-		if (is_lobound)
+		if (!(cselem->selmask & CACHESEL_HIBOUND))
 		{
-			if (!rqelem->have_lobound)
-			{
-				rqelem->have_lobound = true;
-				rqelem->lobound = s2;
-			}
-			else
-			{
-
-				/*------
-				 * We have found two similar clauses, such as
-				 * x < y AND x < z.
-				 * Keep only the more restrictive one.
-				 *------
-				 */
-				if (rqelem->lobound > s2)
-					rqelem->lobound = s2;
-			}
+			cselem->selmask |= CACHESEL_HIBOUND;
+			cselem->hibound = s2;
 		}
 		else
 		{
-			if (!rqelem->have_hibound)
-			{
-				rqelem->have_hibound = true;
-				rqelem->hibound = s2;
-			}
-			else
-			{
 
-				/*------
-				 * We have found two similar clauses, such as
-				 * x > y AND x > z.
-				 * Keep only the more restrictive one.
-				 *------
-				 */
-				if (rqelem->hibound > s2)
-					rqelem->hibound = s2;
-			}
+			/*------
+			 * We have found two similar clauses, such as
+			 * x > y AND x > z.
+			 * Keep only the more restrictive one.
+			 *------
+			 */
+			if (cselem->hibound > s2)
+				cselem->hibound = s2;
 		}
-		return;
 	}
+}
 
-	/* No matching var found, so make a new clause-pair data structure */
-	rqelem = (RangeQueryClause *) palloc(sizeof(RangeQueryClause));
-	rqelem->var = var;
-	if (is_lobound)
-	{
-		rqelem->have_lobound = true;
-		rqelem->have_hibound = false;
-		rqelem->lobound = s2;
-	}
+/*
+ * addCachedSelectivityOtherStrictClause
+ *		Cache the selectivity of other OpExpr type expressions which are
+ *		strict
+ */
+static void
+addCachedSelectivityOtherStrictClause(CachedSelectivityClause **cslist, Node *expr,
+									  Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	if ((cselem->selmask & CACHESEL_OTHERSTRICT))
+		cselem->otherstrictsel = cselem->otherstrictsel * s2;
 	else
 	{
-		rqelem->have_lobound = false;
-		rqelem->have_hibound = true;
-		rqelem->hibound = s2;
+		cselem->otherstrictsel = s2;
+		cselem->selmask |= CACHESEL_OTHERSTRICT;
 	}
-	rqelem->next = *rqlist;
-	*rqlist = rqelem;
 }
 
 /*
#22David Rowley
david.rowley@2ndquadrant.com
In reply to: Claudio Freire (#15)
1 attachment(s)
Re: Making clausesel.c Smarter

On 4 April 2017 at 13:35, Claudio Freire <klaussfreire@gmail.com> wrote:

On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
If you ask me, I'd just leave:

+ if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
DEFAULT_INEQ_SEL)
+ {
+     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
implies we have no stats */
+     s2 = DEFAULT_RANGE_INEQ_SEL;
+ }
+ else
+ {
...
+   s2 = rqlist->hibound + rqlist->lobound - 1.0;
+   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
+   notnullsel = 1.0 - nullsel;
+
+   /* Adjust for double-exclusion of NULLs */
+   s2 += nullsel;
+   if (s2 <= 0.0) {
+      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
+      {
+           /* Most likely contradictory clauses found */
+           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
+      }
+      else
+      {
+           /* Could be a rounding error */
+           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
+      }
+   }
+ }

Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
cautious) estimation of the amount of rounding error that could be
there with 32-bit floats.

The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
an estimation based on stats, maybe approximate, but one that makes
sense (ie: preserves the monotonicity of the CPF), and as such
negative values are either sign of a contradiction or rounding error.
The previous code left the possibility of negatives coming out of
default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
but that doesn't seem possible coming out of scalarineqsel.

I notice this does change the estimates for contradictory clauses such as:

create table a (value int);
insert into a select x/100 from generate_Series(1,10000) x;
analyze a;
explain analyze select * from a where value between 101 and -1;

We now get:

QUERY PLAN
---------------------------------------------------------------------------------------------
Seq Scan on a (cost=0.00..195.00 rows=1 width=4) (actual
time=1.904..1.904 rows=0 loops=1)
Filter: ((value >= 101) AND (value <= '-1'::integer))
Rows Removed by Filter: 10000
Planning time: 0.671 ms
Execution time: 1.950 ms
(5 rows)

where before we'd get:

QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on a (cost=0.00..195.00 rows=50 width=4) (actual
time=0.903..0.903 rows=0 loops=1)
Filter: ((value >= 101) AND (value <= '-1'::integer))
Rows Removed by Filter: 10000
Planning time: 0.162 ms
Execution time: 0.925 ms
(5 rows)

Which comes from the 10000 * 0.005 selectivity estimate from tuples *
DEFAULT_RANGE_INEQ_SEL

I've attached a patch to this effect.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

smarter_clausesel_2017-04-07a.patchapplication/octet-stream; name=smarter_clausesel_2017-04-07a.patchDownload
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 614c1d1..17ab749 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -24,23 +24,39 @@
 #include "utils/selfuncs.h"
 #include "statistics/statistics.h"
 
+#define CACHESEL_LOBOUND		0x0001	/* has var > something selectivity */
+#define CACHESEL_HIBOUND		0x0002	/* has var < something selectivity */
+#define CACHESEL_NULLTEST		0x0004	/* has var IS NULL selectivity */
+#define CACHESEL_NOTNULLTEST	0x0008	/* has var IS NOT NULL selectivity */
+#define CACHESEL_OTHERSTRICT	0x0010	/* has another OpExpr selectivity */
 
 /*
- * Data structure for accumulating info about possible range-query
- * clause pairs in clauselist_selectivity.
+ * Data structure for caching selectivity for clauselist_selectivity.
  */
-typedef struct RangeQueryClause
+typedef struct CachedSelectivityClause
 {
-	struct RangeQueryClause *next;		/* next in linked list */
+	struct CachedSelectivityClause *next;		/* next in linked list */
 	Node	   *var;			/* The common variable of the clauses */
-	bool		have_lobound;	/* found a low-bound clause yet? */
-	bool		have_hibound;	/* found a high-bound clause yet? */
+	int			selmask;		/* Bitmask of which sel types are stored */
 	Selectivity lobound;		/* Selectivity of a var > something clause */
 	Selectivity hibound;		/* Selectivity of a var < something clause */
-} RangeQueryClause;
-
-static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
-			   bool varonleft, bool isLTsel, Selectivity s2);
+	Selectivity nullsel;		/* Selectivity of a IS NULL test */
+	Selectivity notnullsel;		/* Selectivity of a IS NOT NULL test */
+	Selectivity otherstrictsel; /* Selectivity of any other strict clauses */
+} CachedSelectivityClause;
+
+
+static CachedSelectivityClause *findCachedSelectivityVar(
+						 CachedSelectivityClause **cslist, Node *expr);
+static void addCachedSelectivityNullTest(CachedSelectivityClause **cslist,
+							 Node *expr, Selectivity s2);
+static void addCachedSelectivityNotNullTest(CachedSelectivityClause **cslist,
+								Node *expr, Selectivity s2);
+static void addCachedSelectivityRangeVar(CachedSelectivityClause **cslist,
+				   Node *expr, bool varonleft, bool isLTsel, Selectivity s2);
+static void addCachedSelectivityOtherStrictClause(
+									  CachedSelectivityClause **cslist,
+									  Node *expr, Selectivity s2);
 static RelOptInfo *find_relation_from_clauses(PlannerInfo *root,
 											  List *clauses);
 
@@ -85,11 +101,22 @@ static RelOptInfo *find_relation_from_clauses(PlannerInfo *root,
  * hisel can be interpreted directly as a 0..1 value but we need to convert
  * losel to 1-losel before interpreting it as a value.  Then the available
  * range is 1-losel to hisel.  However, this calculation double-excludes
- * nulls, so really we need hisel + losel + null_frac - 1.)
+ * nulls, so really we need hisel + losel + null_frac - 1.).
+ *
+ * IS [NOT] NULL conditions are also handled in a special way. We attempt to
+ * cache all quals on a given expression and only apply the selectivity of an
+ * IS [NOT] NULL condition if no other qual on the same expression would
+ * have already accounted for nulls be filtered, for example;
+ *
+ * WHERE x = 1 AND x IS NOT NULL;
+ *
+ * the x = 1 will have already included filtering the NULL values during its
+ * selectivity estimate, if we went and applied the x IS NOT NULL selectivity
+ * again, then we'd end up underestimating the selectivity over both quals.
  *
- * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
- * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
- * yields an impossible (negative) result.
+ * For range pairs, if either selectivity is exactly DEFAULT_INEQ_SEL, we
+ * forget this equation and instead use DEFAULT_RANGE_INEQ_SEL.  The same
+ * applies if the equation yields an impossible (negative) result.
  *
  * A free side-effect is that we can recognize redundant inequalities such
  * as "x < 4 AND x < 5"; only the tighter constraint will be counted.
@@ -105,7 +132,7 @@ clauselist_selectivity(PlannerInfo *root,
 					   SpecialJoinInfo *sjinfo)
 {
 	Selectivity s1 = 1.0;
-	RangeQueryClause *rqlist = NULL;
+	CachedSelectivityClause *cslist = NULL;
 	ListCell   *l;
 	Bitmapset  *estimatedclauses = NULL;
 	int			listidx;
@@ -193,6 +220,20 @@ clauselist_selectivity(PlannerInfo *root,
 		else
 			rinfo = NULL;
 
+		if (IsA(clause, NullTest))
+		{
+			NullTestType nulltesttype = ((NullTest *) clause)->nulltesttype;
+
+			if (nulltesttype == IS_NULL)
+				addCachedSelectivityNullTest(&cslist,
+									(Node *) ((NullTest *) clause)->arg, s2);
+			else if (nulltesttype == IS_NOT_NULL)
+				addCachedSelectivityNotNullTest(&cslist,
+									(Node *) ((NullTest *) clause)->arg, s2);
+
+			continue;			/* drop to loop bottom */
+		}
+
 		/*
 		 * See if it looks like a restriction clause with a pseudoconstant on
 		 * one side.  (Anything more complicated than that might not behave in
@@ -204,26 +245,35 @@ clauselist_selectivity(PlannerInfo *root,
 			OpExpr	   *expr = (OpExpr *) clause;
 			bool		varonleft = true;
 			bool		ok;
+			Node	   *leftexpr = (Node *) linitial(expr->args);
+			Node	   *rightexpr = (Node *) lsecond(expr->args);
 
 			if (rinfo)
 			{
 				ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
-					(is_pseudo_constant_clause_relids(lsecond(expr->args),
+					(is_pseudo_constant_clause_relids(rightexpr,
 													  rinfo->right_relids) ||
 					 (varonleft = false,
-					  is_pseudo_constant_clause_relids(linitial(expr->args),
+					  is_pseudo_constant_clause_relids(leftexpr,
 													   rinfo->left_relids)));
 			}
 			else
 			{
 				ok = (NumRelids(clause) == 1) &&
-					(is_pseudo_constant_clause(lsecond(expr->args)) ||
+					(is_pseudo_constant_clause(rightexpr) ||
 					 (varonleft = false,
-					  is_pseudo_constant_clause(linitial(expr->args))));
+					  is_pseudo_constant_clause(leftexpr)));
 			}
 
 			if (ok)
 			{
+				Node	   *var;
+
+				if (varonleft)
+					var = leftexpr;
+				else
+					var = rightexpr;
+
 				/*
 				 * If it's not a "<" or ">" operator, just merge the
 				 * selectivity in generically.  But if it's the right oprrest,
@@ -232,16 +282,25 @@ clauselist_selectivity(PlannerInfo *root,
 				switch (get_oprrest(expr->opno))
 				{
 					case F_SCALARLTSEL:
-						addRangeClause(&rqlist, clause,
-									   varonleft, true, s2);
+						addCachedSelectivityRangeVar(&cslist, var,
+													 varonleft, true, s2);
 						break;
 					case F_SCALARGTSEL:
-						addRangeClause(&rqlist, clause,
-									   varonleft, false, s2);
+						addCachedSelectivityRangeVar(&cslist, var,
+													 varonleft, false, s2);
 						break;
 					default:
-						/* Just merge the selectivity in generically */
-						s1 = s1 * s2;
+
+						/*
+						 * Cache all strict clauses in selectivity other.
+						 * Anything non-strict we'll just apply the
+						 * selectivity now, since we're currently unable to do
+						 * anything particularly smart with it.
+						 */
+						if (op_strict(expr->opno))
+							addCachedSelectivityOtherStrictClause(&cslist, var, s2);
+						else
+							s1 = s1 * s2;
 						break;
 				}
 				continue;		/* drop to loop bottom */
@@ -253,176 +312,259 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
-	 * Now scan the rangequery pair list.
+	 * Now scan the cached selectivity list
 	 */
-	while (rqlist != NULL)
+	while (cslist != NULL)
 	{
-		RangeQueryClause *rqnext;
+		CachedSelectivityClause *csnext;
 
-		if (rqlist->have_lobound && rqlist->have_hibound)
+		if ((cslist->selmask & CACHESEL_NULLTEST))
 		{
-			/* Successfully matched a pair of range clauses */
-			Selectivity s2;
-
 			/*
-			 * Exact equality to the default value probably means the
-			 * selectivity function punted.  This is not airtight but should
-			 * be good enough.
+			 * if null test is not the only flag then there can be no matching
+			 * rows at all.
 			 */
-			if (rqlist->hibound == DEFAULT_INEQ_SEL ||
-				rqlist->lobound == DEFAULT_INEQ_SEL)
+			if (cslist->selmask != CACHESEL_NULLTEST)
 			{
-				s2 = DEFAULT_RANGE_INEQ_SEL;
+				s1 = 0;
+				break;			/* nothing more needs estimated */
 			}
 			else
-			{
-				s2 = rqlist->hibound + rqlist->lobound - 1.0;
+				s1 *= cslist->nullsel;
+		}
 
-				/* Adjust for double-exclusion of NULLs */
-				s2 += nulltestsel(root, IS_NULL, rqlist->var,
-								  varRelid, jointype, sjinfo);
+		/*
+		 * An IS NOT NULL test is a no-op if there's any other strict quals,
+		 * so if that's the case, then we'll only apply this, otherwise we'll
+		 * ignore it.
+		 */
+		else if (cslist->selmask == CACHESEL_NOTNULLTEST)
+			s1 *= cslist->notnullsel;
+
+		else
+		{
+			/* Check if both lobound and hibound were seen */
+			if ((cslist->selmask & (CACHESEL_LOBOUND | CACHESEL_HIBOUND)) ==
+				(CACHESEL_LOBOUND | CACHESEL_HIBOUND))
+			{
+				/* Successfully matched a pair of range clauses */
+				Selectivity s2;
 
 				/*
-				 * A zero or slightly negative s2 should be converted into a
-				 * small positive value; we probably are dealing with a very
-				 * tight range and got a bogus result due to roundoff errors.
-				 * However, if s2 is very negative, then we probably have
-				 * default selectivity estimates on one or both sides of the
-				 * range that we failed to recognize above for some reason.
+				 * Exact equality to the default value probably means the
+				 * selectivity function punted.  This is not airtight but
+				 * should be good enough.
 				 */
-				if (s2 <= 0.0)
+				if (cslist->hibound == DEFAULT_INEQ_SEL ||
+					cslist->lobound == DEFAULT_INEQ_SEL)
 				{
-					if (s2 < -0.01)
-					{
-						/*
-						 * No data available --- use a default estimate that
-						 * is small, but not real small.
-						 */
-						s2 = DEFAULT_RANGE_INEQ_SEL;
-					}
-					else
+					s2 = DEFAULT_RANGE_INEQ_SEL;
+				}
+				else
+				{
+					Selectivity nullsel;
+					Selectivity notnullsel;
+
+					s2 = cslist->hibound + cslist->lobound - 1.0;
+
+					nullsel = nulltestsel(root, IS_NULL, cslist->var, varRelid,
+										  jointype, sjinfo);
+					notnullsel = 1.0 - nullsel;
+
+					/* Adjust for double-exclusion of NULLs */
+					s2 += nullsel;
+
+					/*
+					 * A zero or slightly negative s2 should be converted into
+					 * a small positive value; we probably are dealing with a
+					 * very tight range and got a bogus result due to roundoff
+					 * errors. However, if s2 is very negative, then we
+					 * probably have default selectivity estimates on one or
+					 * both sides of the range that we failed to recognize
+					 * above for some reason.
+					 */
+					if (s2 <= 0.0)
 					{
-						/*
-						 * It's just roundoff error; use a small positive
-						 * value
-						 */
-						s2 = 1.0e-10;
+						if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
+						{
+							/* Most likely contradictory clauses found */
+							s2 = Min(1.0e-10, notnullsel);
+						}
+						else
+						{
+							/* Could be a rounding error */
+							s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
+						}
 					}
 				}
+				/* Merge in the selectivity of the pair of clauses */
+				s1 *= s2;
 			}
-			/* Merge in the selectivity of the pair of clauses */
-			s1 *= s2;
-		}
-		else
-		{
-			/* Only found one of a pair, merge it in generically */
-			if (rqlist->have_lobound)
-				s1 *= rqlist->lobound;
 			else
-				s1 *= rqlist->hibound;
+			{
+				/* Only found one of a range pair, merge it in generically */
+				if ((cslist->selmask & CACHESEL_LOBOUND))
+					s1 *= cslist->lobound;
+				else if ((cslist->selmask & CACHESEL_HIBOUND))
+					s1 *= cslist->hibound;
+			}
+
+			/* apply the selectivity for any other seen strict qual */
+			if ((cslist->selmask & CACHESEL_OTHERSTRICT))
+				s1 *= cslist->otherstrictsel;
 		}
+
 		/* release storage and advance */
-		rqnext = rqlist->next;
-		pfree(rqlist);
-		rqlist = rqnext;
+		csnext = cslist->next;
+		pfree(cslist);
+		cslist = csnext;
 	}
 
 	return s1;
 }
 
 /*
- * addRangeClause --- add a new range clause for clauselist_selectivity
+ * findCachedSelectivityVar
+ *	Find existing seletivity var, or add this var to the list.
+ */
+static CachedSelectivityClause *
+findCachedSelectivityVar(CachedSelectivityClause **cslist, Node *expr)
+{
+	CachedSelectivityClause *cselem;
+
+	for (cselem = *cslist; cselem; cselem = cselem->next)
+	{
+		/*
+		 * We use full equal() here because the "var" might be a function of
+		 * one or more attributes of the same relation...
+		 */
+		if (equal(expr, cselem->var))
+			return cselem;
+	}
+
+	/* not found -- add it */
+	cselem = (CachedSelectivityClause *) palloc(sizeof(CachedSelectivityClause));
+	cselem->var = expr;
+	cselem->selmask = 0;
+
+	cselem->next = *cslist;
+	*cslist = cselem;
+	return cselem;
+}
+
+
+/*
+ * addCachedSelectivityNullTest
+ *		Cache selectivity for an IS NULL test.
+ */
+static void
+addCachedSelectivityNullTest(CachedSelectivityClause **cslist, Node *expr,
+							 Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	/* We can simply overwrite any previously cached selectivity here */
+	cselem->nullsel = s2;
+	cselem->selmask |= CACHESEL_NULLTEST;
+}
+
+/*
+ * addCachedSelectivityNotNullTest
+ *		Cache selectivity for an IS NOT NULL test.
+ */
+static void
+addCachedSelectivityNotNullTest(CachedSelectivityClause **cslist, Node *expr,
+								Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	/* We can simply overwrite any previously cached selectivity here */
+	cselem->notnullsel = s2;
+	cselem->selmask |= CACHESEL_NOTNULLTEST;
+}
+
+/*
+ * addCachedSelectivityRangeVar
+ *		add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
  */
 static void
-addRangeClause(RangeQueryClause **rqlist, Node *clause,
-			   bool varonleft, bool isLTsel, Selectivity s2)
+addCachedSelectivityRangeVar(CachedSelectivityClause **cslist, Node *expr,
+							 bool varonleft, bool isLTsel, Selectivity s2)
 {
-	RangeQueryClause *rqelem;
-	Node	   *var;
+	CachedSelectivityClause *cselem;
 	bool		is_lobound;
 
-	if (varonleft)
+	is_lobound = (varonleft != isLTsel);
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	if (is_lobound)
 	{
-		var = get_leftop((Expr *) clause);
-		is_lobound = !isLTsel;	/* x < something is high bound */
+		if (!(cselem->selmask & CACHESEL_LOBOUND))
+		{
+			cselem->selmask |= CACHESEL_LOBOUND;
+			cselem->lobound = s2;
+		}
+		else
+		{
+			/*------
+			 * We have found two similar clauses, such as
+			 * x < y AND x < z.
+			 * Keep only the more restrictive one.
+			 *------
+			 */
+			if (cselem->lobound > s2)
+				cselem->lobound = s2;
+		}
 	}
 	else
 	{
-		var = get_rightop((Expr *) clause);
-		is_lobound = isLTsel;	/* something < x is low bound */
-	}
-
-	for (rqelem = *rqlist; rqelem; rqelem = rqelem->next)
-	{
-		/*
-		 * We use full equal() here because the "var" might be a function of
-		 * one or more attributes of the same relation...
-		 */
-		if (!equal(var, rqelem->var))
-			continue;
-		/* Found the right group to put this clause in */
-		if (is_lobound)
+		if (!(cselem->selmask & CACHESEL_HIBOUND))
 		{
-			if (!rqelem->have_lobound)
-			{
-				rqelem->have_lobound = true;
-				rqelem->lobound = s2;
-			}
-			else
-			{
-
-				/*------
-				 * We have found two similar clauses, such as
-				 * x < y AND x < z.
-				 * Keep only the more restrictive one.
-				 *------
-				 */
-				if (rqelem->lobound > s2)
-					rqelem->lobound = s2;
-			}
+			cselem->selmask |= CACHESEL_HIBOUND;
+			cselem->hibound = s2;
 		}
 		else
 		{
-			if (!rqelem->have_hibound)
-			{
-				rqelem->have_hibound = true;
-				rqelem->hibound = s2;
-			}
-			else
-			{
 
-				/*------
-				 * We have found two similar clauses, such as
-				 * x > y AND x > z.
-				 * Keep only the more restrictive one.
-				 *------
-				 */
-				if (rqelem->hibound > s2)
-					rqelem->hibound = s2;
-			}
+			/*------
+			 * We have found two similar clauses, such as
+			 * x > y AND x > z.
+			 * Keep only the more restrictive one.
+			 *------
+			 */
+			if (cselem->hibound > s2)
+				cselem->hibound = s2;
 		}
-		return;
 	}
+}
 
-	/* No matching var found, so make a new clause-pair data structure */
-	rqelem = (RangeQueryClause *) palloc(sizeof(RangeQueryClause));
-	rqelem->var = var;
-	if (is_lobound)
-	{
-		rqelem->have_lobound = true;
-		rqelem->have_hibound = false;
-		rqelem->lobound = s2;
-	}
+/*
+ * addCachedSelectivityOtherStrictClause
+ *		Cache the selectivity of other OpExpr type expressions which are
+ *		strict
+ */
+static void
+addCachedSelectivityOtherStrictClause(CachedSelectivityClause **cslist, Node *expr,
+									  Selectivity s2)
+{
+	CachedSelectivityClause *cselem;
+
+	cselem = findCachedSelectivityVar(cslist, expr);
+
+	if ((cselem->selmask & CACHESEL_OTHERSTRICT))
+		cselem->otherstrictsel = cselem->otherstrictsel * s2;
 	else
 	{
-		rqelem->have_lobound = false;
-		rqelem->have_hibound = true;
-		rqelem->hibound = s2;
+		cselem->otherstrictsel = s2;
+		cselem->selmask |= CACHESEL_OTHERSTRICT;
 	}
-	rqelem->next = *rqlist;
-	*rqlist = rqelem;
 }
 
 /*
#23Claudio Freire
klaussfreire@gmail.com
In reply to: David Rowley (#22)
Re: Making clausesel.c Smarter

On Fri, Apr 7, 2017 at 2:28 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

+ if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
DEFAULT_INEQ_SEL)
+ {
+     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
implies we have no stats */
+     s2 = DEFAULT_RANGE_INEQ_SEL;
+ }
+ else
+ {
...
+   s2 = rqlist->hibound + rqlist->lobound - 1.0;
+   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
+   notnullsel = 1.0 - nullsel;
+
+   /* Adjust for double-exclusion of NULLs */
+   s2 += nullsel;
+   if (s2 <= 0.0) {
+      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
+      {
+           /* Most likely contradictory clauses found */
+           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
+      }
+      else
+      {
+           /* Could be a rounding error */
+           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
+      }
+   }
+ }

Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
cautious) estimation of the amount of rounding error that could be
there with 32-bit floats.

The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
an estimation based on stats, maybe approximate, but one that makes
sense (ie: preserves the monotonicity of the CPF), and as such
negative values are either sign of a contradiction or rounding error.
The previous code left the possibility of negatives coming out of
default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
but that doesn't seem possible coming out of scalarineqsel.

I notice this does change the estimates for contradictory clauses such as:

create table a (value int);
insert into a select x/100 from generate_Series(1,10000) x;
analyze a;
explain analyze select * from a where value between 101 and -1;

We now get:

QUERY PLAN
---------------------------------------------------------------------------------------------
Seq Scan on a (cost=0.00..195.00 rows=1 width=4) (actual
time=1.904..1.904 rows=0 loops=1)
Filter: ((value >= 101) AND (value <= '-1'::integer))
Rows Removed by Filter: 10000
Planning time: 0.671 ms
Execution time: 1.950 ms
(5 rows)

where before we'd get:

QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on a (cost=0.00..195.00 rows=50 width=4) (actual
time=0.903..0.903 rows=0 loops=1)
Filter: ((value >= 101) AND (value <= '-1'::integer))
Rows Removed by Filter: 10000
Planning time: 0.162 ms
Execution time: 0.925 ms
(5 rows)

Which comes from the 10000 * 0.005 selectivity estimate from tuples *
DEFAULT_RANGE_INEQ_SEL

I've attached a patch to this effect.

+                    /*
+                     * A zero or slightly negative s2 should be converted into
+                     * a small positive value; we probably are dealing with a
+                     * very tight range and got a bogus result due to roundoff
+                     * errors. However, if s2 is very negative, then we
+                     * probably have default selectivity estimates on one or
+                     * both sides of the range that we failed to recognize
+                     * above for some reason.
+                     */
+                    if (s2 <= 0.0)

That comment seems outdated

Otherwise, the patch LGTM, but I'd like to solve the quadratic
behavior too... are you going to try? Otherwise I could take a stab at
it myself. It doesn't seem very difficult.

Also, can you add a test case? Cost values could make the test
fragile, so if that gives you trouble I'll understand, but it'd be
best to try and test this if possible.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24David Steele
david@pgmasters.net
In reply to: Claudio Freire (#23)
Re: Making clausesel.c Smarter

On 4/7/17 5:33 PM, Claudio Freire wrote:

Also, can you add a test case? Cost values could make the test
fragile, so if that gives you trouble I'll understand, but it'd be
best to try and test this if possible.

This submission has been moved to CF 2017-07.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25David Rowley
david.rowley@2ndquadrant.com
In reply to: Claudio Freire (#23)
Re: Making clausesel.c Smarter

On 8 April 2017 at 09:33, Claudio Freire <klaussfreire@gmail.com> wrote:

Otherwise, the patch LGTM, but I'd like to solve the quadratic
behavior too... are you going to try? Otherwise I could take a stab at
it myself. It doesn't seem very difficult.

I have some ideas in my head in a fairly generic way of solving this
which I'd like to take care of in PG11.

Many thanks for your reviews and suggestions on this patch, it's much
appreciated.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Daniel Gustafsson
daniel@yesql.se
In reply to: David Rowley (#25)
Re: Making clausesel.c Smarter

On 09 Apr 2017, at 12:14, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 8 April 2017 at 09:33, Claudio Freire <klaussfreire@gmail.com> wrote:

Otherwise, the patch LGTM, but I'd like to solve the quadratic
behavior too... are you going to try? Otherwise I could take a stab at
it myself. It doesn't seem very difficult.

I have some ideas in my head in a fairly generic way of solving this
which I'd like to take care of in PG11.

This patch was moved to the currently open Commitfest. Given the above
comment, is the last patch in this thread still up for review, or are you
working on a new version? Just to avoid anyone reviewing an outdated version.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27David Rowley
david.rowley@2ndquadrant.com
In reply to: Daniel Gustafsson (#26)
Re: Making clausesel.c Smarter

On 6 September 2017 at 00:43, Daniel Gustafsson <daniel@yesql.se> wrote:

This patch was moved to the currently open Commitfest. Given the above
comment, is the last patch in this thread still up for review, or are you
working on a new version? Just to avoid anyone reviewing an outdated version.

Hi Daniel,

I've not had time to work on a new version yet and I can't imagine
that I will for quite some time.

The idea I had to remove the quadratic search on the list was to add
to or modify equal() in equalfuncs.c to have it return -1, 0, +1 and
use that as a comparison function in a binary search tree. The Btree
would complement List as a way of storing Nodes in no particular
order, but in a way that a Node can be found quickly. There's likely
more than a handful of places in the planner that would benefit from
this already.

It's not a 5-minute patch and it probably would see some resistance,
so won't have time to look at this soon.

If the possibility of this increasing planning time in corner cases is
going to be a problem, then it might be best to return this with
feedback for now and I'll resubmit if I get time later in the cycle.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Daniel Gustafsson
daniel@yesql.se
In reply to: David Rowley (#27)
Re: Making clausesel.c Smarter

On 06 Sep 2017, at 07:13, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 6 September 2017 at 00:43, Daniel Gustafsson <daniel@yesql.se> wrote:

This patch was moved to the currently open Commitfest. Given the above
comment, is the last patch in this thread still up for review, or are you
working on a new version? Just to avoid anyone reviewing an outdated version.

Hi Daniel,

I've not had time to work on a new version yet and I can't imagine
that I will for quite some time.

The idea I had to remove the quadratic search on the list was to add
to or modify equal() in equalfuncs.c to have it return -1, 0, +1 and
use that as a comparison function in a binary search tree. The Btree
would complement List as a way of storing Nodes in no particular
order, but in a way that a Node can be found quickly. There's likely
more than a handful of places in the planner that would benefit from
this already.

It's not a 5-minute patch and it probably would see some resistance,
so won't have time to look at this soon.

If the possibility of this increasing planning time in corner cases is
going to be a problem, then it might be best to return this with
feedback for now and I'll resubmit if I get time later in the cycle.

For now I’ve marked this “Waiting on Author” awaiting a new patch for the
community to consider. If there is no time to hack on this in the current CF
(which seems likely given your mail above) I’ll move it to the next CF as this
one draws to a close.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Daniel Gustafsson
daniel@yesql.se
In reply to: Daniel Gustafsson (#28)
Re: Making clausesel.c Smarter

On 07 Sep 2017, at 09:30, Daniel Gustafsson <daniel@yesql.se> wrote:

On 06 Sep 2017, at 07:13, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 6 September 2017 at 00:43, Daniel Gustafsson <daniel@yesql.se> wrote:

This patch was moved to the currently open Commitfest. Given the above
comment, is the last patch in this thread still up for review, or are you
working on a new version? Just to avoid anyone reviewing an outdated version.

Hi Daniel,

I've not had time to work on a new version yet and I can't imagine
that I will for quite some time.

The idea I had to remove the quadratic search on the list was to add
to or modify equal() in equalfuncs.c to have it return -1, 0, +1 and
use that as a comparison function in a binary search tree. The Btree
would complement List as a way of storing Nodes in no particular
order, but in a way that a Node can be found quickly. There's likely
more than a handful of places in the planner that would benefit from
this already.

It's not a 5-minute patch and it probably would see some resistance,
so won't have time to look at this soon.

If the possibility of this increasing planning time in corner cases is
going to be a problem, then it might be best to return this with
feedback for now and I'll resubmit if I get time later in the cycle.

For now I’ve marked this “Waiting on Author” awaiting a new patch for the
community to consider. If there is no time to hack on this in the current CF
(which seems likely given your mail above) I’ll move it to the next CF as this
one draws to a close.

Going back on my previous thinking of moving to the next CF, I am instead
marking this as returned with feedback. When a new version of the patch is
ready, please re-submit to a future commitfest.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers