Use extended statistics to estimate (Var op Var) clauses

Started by Tomas Vondraabout 5 years ago49 messages
#1Tomas Vondra
tomas.vondra@enterprisedb.com
1 attachment(s)

Hi,

Attached is a patch to allow estimation of (Var op Var) clauses using
extended statistics. Currently we only use extended stats to estimate
(Var op Const) clauses, which is sufficient for most cases, but it's not
very hard to support this second type of clauses.

This is not an entirely new patch - I've originally included it in the
patch series in [1]/messages/by-id/20200113230008.g67iyk4cs3xbnjju@development but it's probably better to discuss it separately,
so that it does not get buried in that discussion.

[1]: /messages/by-id/20200113230008.g67iyk4cs3xbnjju@development
/messages/by-id/20200113230008.g67iyk4cs3xbnjju@development

To illustrate the purpose of this patch, consider this:

db=# create table t (a int, b int);
CREATE TABLE

db=# insert into t select mod(i,10), mod(i,10)+1
from generate_series(1,100000) s(i);
INSERT 0 100000

db=# analyze t;
ANALYZE

db=# explain select * from t where a < b;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t (cost=0.00..1693.00 rows=33333 width=8)
Filter: (a < b)
(2 rows)

db=# explain select * from t where a > b;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t (cost=0.00..1693.00 rows=33333 width=8)
Filter: (a > b)
(2 rows)

db=# create statistics s (mcv) on a,b from t;
CREATE STATISTICS

db=# analyze t;
ANALYZE

db=# explain select * from t where a < b;
QUERY PLAN
---------------------------------------------------------
Seq Scan on t (cost=0.00..1693.00 rows=100000 width=8)
Filter: (a < b)
(2 rows)

db=# explain select * from t where a > b;
QUERY PLAN
----------------------------------------------------
Seq Scan on t (cost=0.00..1693.00 rows=1 width=8)
Filter: (a > b)
(2 rows)

I'm not entirely convinced this patch (on it's own) is very useful, for
a couple of reasons:

(a) Clauses of this form are not particularly common, at least compared
to the Var op Const clauses. (I don't recall slow-query reports from any
of our mailing lists that might be attributed to such clauses.)

(b) For known cases of such queries (e.g. several TPC-H queries do use
clauses like "l_commitdate < l_receiptdate" etc.) this is somewhat
hindered by extended statistics only supporting MCV lists, which may not
work particularly well for high-cardinality columns like dates etc.

But despite that it seems like a useful feature / building block, and
those limitations may be addressed in some other way (e.g. we may add
multi-dimensional histograms to address the second one).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Support-estimation-of-clauses-of-the-form-V-20201113.patchtext/x-patch; charset=UTF-8; name=0001-Support-estimation-of-clauses-of-the-form-V-20201113.patchDownload
From 439d39db128b9cbc06063a862283d663510d0fcc Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Fri, 13 Nov 2020 01:46:50 +0100
Subject: [PATCH] Support estimation of clauses of the form Var op Var

Until now, extended statistics were used only to estimate clauses of the
form (Var op Const). This patch extends the estimation to also handle
clauses (Var op Var).
---
 src/backend/statistics/extended_stats.c       | 67 +++++++++----
 src/backend/statistics/mcv.c                  | 71 +++++++++++++-
 .../statistics/extended_stats_internal.h      |  2 +-
 src/test/regress/expected/stats_ext.out       | 96 +++++++++++++++++++
 src/test/regress/sql/stats_ext.sql            | 26 +++++
 5 files changed, 243 insertions(+), 19 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 36326927c6..1a6e7a3fc0 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -987,14 +987,18 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 	{
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		OpExpr	   *expr = (OpExpr *) clause;
-		Var		   *var;
+		Var		   *var,
+				   *var2;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
-		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_clause_args(expr->args, &var, NULL, NULL))
+		/*
+		 * Check if the expression has the right shape (one Var and one Const,
+		 * or two Vars).
+		 */
+		if (!examine_clause_args(expr->args, &var, &var2, NULL, NULL))
 			return false;
 
 		/*
@@ -1034,7 +1038,20 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			!get_func_leakproof(get_opcode(expr->opno)))
 			return false;
 
-		return statext_is_compatible_clause_internal(root, (Node *) var,
+		/*
+		 * Check compatibility of the first Var - we get this one for both
+		 * types of supported expressions (Var op Const) and (Var op Var).
+		 */
+		if (!statext_is_compatible_clause_internal(root, (Node *) var,
+												   relid, attnums))
+			return false;
+
+		/* For (Var op Const) we don't get the second Var, and we're done. */
+		if (!var2)
+			return true;
+
+		/* For (Var op Var) check compatibility of the second Var. */
+		return statext_is_compatible_clause_internal(root, (Node *) var2,
 													 relid, attnums);
 	}
 
@@ -1050,7 +1067,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_clause_args(expr->args, &var, NULL, NULL))
+		if (!examine_clause_args(expr->args, &var, NULL, NULL, NULL))
 			return false;
 
 		/*
@@ -1446,22 +1463,24 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
 }
 
 /*
- * examine_opclause_expression
+ * examine_clause_args
  *		Split expression into Var and Const parts.
  *
- * Attempts to match the arguments to either (Var op Const) or (Const op Var),
- * possibly with a RelabelType on top. When the expression matches this form,
- * returns true, otherwise returns false.
+ * Attempts to match the arguments to either (Var op Const) or (Const op Var)
+ * or (Var op Var), possibly with a RelabelType on top. When the expression
+ * matches this form, returns true, otherwise returns false.
  *
  * Optionally returns pointers to the extracted Var/Const nodes, when passed
  * non-null pointers (varp, cstp and varonleftp). The varonleftp flag specifies
  * on which side of the operator we found the Var node.
  */
 bool
-examine_clause_args(List *args, Var **varp, Const **cstp, bool *varonleftp)
+examine_clause_args(List *args, Var **var1p, Var **var2p, Const **cstp,
+					bool *varonleftp)
 {
-	Var		   *var;
-	Const	   *cst;
+	Var	   *var1 = NULL;
+	Var	   *var2 = NULL;
+	Const  *cst = NULL;
 	bool		varonleft;
 	Node	   *leftop,
 			   *rightop;
@@ -1481,22 +1500,38 @@ examine_clause_args(List *args, Var **varp, Const **cstp, bool *varonleftp)
 
 	if (IsA(leftop, Var) && IsA(rightop, Const))
 	{
-		var = (Var *) leftop;
+		var1 = (Var *) leftop;
 		cst = (Const *) rightop;
 		varonleft = true;
 	}
 	else if (IsA(leftop, Const) && IsA(rightop, Var))
 	{
-		var = (Var *) rightop;
+		var1 = (Var *) rightop;
 		cst = (Const *) leftop;
 		varonleft = false;
 	}
+	else if (IsA(leftop, Var) && IsA(rightop, Var))
+	{
+		var1 = (Var *) leftop;
+		var2 = (Var *) rightop;
+		varonleft = false;
+
+		/*
+		 * Both variables have to be for the same relation (otherwise it's
+		 * a join clause, and we don't deal with those yet.
+		 */
+		if (var1->varno != var2->varno)
+			return false;
+	}
 	else
 		return false;
 
 	/* return pointers to the extracted parts if requested */
-	if (varp)
-		*varp = var;
+	if (var1p)
+		*var1p = var1;
+
+	if (var2p)
+		*var2p = var2;
 
 	if (cstp)
 		*cstp = cst;
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 6a262f1543..ac9a8020b8 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1583,13 +1583,17 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_clause_args returns true */
 			Var		   *var;
+			Var		   *var2;
 			Const	   *cst;
 			bool		varonleft;
 
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var and const from the expression */
-			if (examine_clause_args(expr->args, &var, &cst, &varonleft))
+			if (!examine_clause_args(expr->args, &var, &var2, &cst, &varonleft))
+				continue;
+
+			if (cst)	/* Var op Const */
 			{
 				int			idx;
 
@@ -1653,6 +1657,68 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 					matches[i] = RESULT_MERGE(matches[i], is_or, match);
 				}
 			}
+			else	/* Var op Var */
+			{
+				int			idx;
+				int			idx2;
+
+				Assert(var2);
+
+				/* match the attribute to a dimension of the statistic */
+				idx = bms_member_index(keys, var->varattno);
+				idx2 = bms_member_index(keys, var2->varattno);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause.
+				 * We can skip items that were already ruled out, and
+				 * terminate if there are no remaining MCV items that might
+				 * possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					/*
+					 * When either of the MCV items is NULL we can treat this
+					 * as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || item->isnull[idx2])
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap.
+					 * Once the value gets false for AND-lists, or true for
+					 * OR-lists, we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * First check whether the constant is below the lower
+					 * boundary (in that case we can skip the bucket, because
+					 * there's no overlap).
+					 *
+					 * We don't store collations used to build the statistics,
+					 * but we can use the collation for the attribute itself,
+					 * as stored in varcollid. We do reset the statistics after
+					 * a type change (including collation change), so this is
+					 * OK. We may need to relax this after allowing extended
+					 * statistics on expressions.
+					 */
+					match = DatumGetBool(FunctionCall2Coll(&opproc,
+														   var->varcollid,
+														   item->values[idx],
+														   item->values[idx2]));
+
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
+				}
+			}
 		}
 		else if (IsA(clause, ScalarArrayOpExpr))
 		{
@@ -1667,7 +1733,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var and const from the expression */
-			if (examine_clause_args(expr->args, &var, &cst, &varonleft))
+			if (examine_clause_args(expr->args, &var, NULL, &cst, &varonleft))
 			{
 				int			idx;
 
@@ -1681,6 +1747,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 				/* ScalarArrayOpExpr has the Var always on the left */
 				Assert(varonleft);
+				Assert(cst);
 
 				if (!cst->constisnull)
 				{
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 61e69696cf..8f0bf2df5c 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -96,7 +96,7 @@ extern SortItem *build_sorted_items(int numrows, int *nitems, HeapTuple *rows,
 									TupleDesc tdesc, MultiSortSupport mss,
 									int numattrs, AttrNumber *attnums);
 
-extern bool examine_clause_args(List *args, Var **varp,
+extern bool examine_clause_args(List *args, Var **var1p, Var **var2p,
 								Const **cstp, bool *varonleftp);
 
 extern Selectivity mcv_clauselist_selectivity(PlannerInfo *root,
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 4c3edd213f..f77d1d5469 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -978,6 +978,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        343 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      1667 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -1113,6 +1125,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        200 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      3750 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -1323,6 +1347,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       3750 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+        25 |   2500
+(1 row)
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 ANALYZE mcv_lists;
@@ -1356,6 +1386,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       2500 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -1450,6 +1486,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
       1094 |      0
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      9950 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 ANALYZE mcv_lists_bool;
@@ -1477,6 +1525,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
          1 |      0
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -1512,6 +1572,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AN
          4 |    142
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+         1 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
 CREATE STATISTICS mcv_lists_multi_2 (mcv) ON c, d FROM mcv_lists_multi;
@@ -1534,6 +1612,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AN
        143 |    142
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
 DROP TABLE mcv_lists_multi;
 -- Permission tests. Users should not be able to see specific data values in
 -- the extended statistics, if they lack permission to see those values in
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 9781e590a3..7ae02397a9 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -512,6 +512,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -561,6 +565,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE 4 >= a AND ''0
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -671,6 +679,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 
@@ -689,6 +699,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -764,6 +776,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 
@@ -777,6 +793,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -800,6 +820,9 @@ ANALYZE mcv_lists_multi;
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
@@ -810,6 +833,9 @@ ANALYZE mcv_lists_multi;
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 DROP TABLE mcv_lists_multi;
 
-- 
2.26.2

#2Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tomas Vondra (#1)
Re: Use extended statistics to estimate (Var op Var) clauses

On Nov 12, 2020, at 5:14 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

<0001-Support-estimation-of-clauses-of-the-form-V-20201113.patch>

Your patch no longer applies. Can we get a new version please?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Mark Dilger (#2)
Re: Use extended statistics to estimate (Var op Var) clauses

On 3/1/21 8:58 PM, Mark Dilger wrote:

On Nov 12, 2020, at 5:14 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

<0001-Support-estimation-of-clauses-of-the-form-V-20201113.patch>

Your patch no longer applies. Can we get a new version please?

I do not plan to work on this patch in the 2021-03 commitfest. I'll
focus on the other patch about extended statistics on expressions.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#3)
1 attachment(s)
Re: Use extended statistics to estimate (Var op Var) clauses

Hi,

Here is a slightly updated version of the patch - rebased to current
master and fixing some minor issues to handle expressions (and not just
the Var nodes as before).

The changes needed to support (Expr op Expr) are mostly mechanical,
though I'm sure the code needs some cleanup. The main issue I ran into
is the special case clauselist_selectivity, which does

if (list_length(clauses) == 1)
return clause_selectivity_ext(...);

which applies to cases like "WHERE a < b" which can now be handled by
extended statistics, thanks to this patch. But clause_selectivity_ext
only used to call restriction_selectivity for these clauses, which does
not use extended statistics, of course.

I considered either getting rid of the special case, passing everything
through extended stats, including cases with a single clause. But that
ends up affecting e.g. OR clauses, so I tweaked clause_selectivity_ext a
bit, which seems like a better approach.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Handling-Expr-op-Expr-clauses-in-extended-s-20210613.patchtext/x-patch; charset=UTF-8; name=0001-Handling-Expr-op-Expr-clauses-in-extended-s-20210613.patchDownload
From 6dcd4f2fc78be2fca8a4e934fc2e24bcd8340c4c Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 13 Jun 2021 21:59:16 +0200
Subject: [PATCH] Handling Expr op Expr clauses in extended stats

---
 src/backend/optimizer/path/clausesel.c        |  46 ++++-
 src/backend/statistics/extended_stats.c       |  81 +++++++--
 src/backend/statistics/mcv.c                  | 169 +++++++++++++-----
 .../statistics/extended_stats_internal.h      |   2 +-
 src/test/regress/expected/stats_ext.out       |  96 ++++++++++
 src/test/regress/sql/stats_ext.sql            |  26 +++
 6 files changed, 350 insertions(+), 70 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d263ecf082..d732a9dc93 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -714,6 +714,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
 	bool		cacheable = false;
+	Node	   *src = clause;
 
 	if (clause == NULL)			/* can this still happen? */
 		return s1;
@@ -871,11 +872,46 @@ clause_selectivity_ext(PlannerInfo *root,
 		}
 		else
 		{
-			/* Estimate selectivity for a restriction clause. */
-			s1 = restriction_selectivity(root, opno,
-										 opclause->args,
-										 opclause->inputcollid,
-										 varRelid);
+			/*
+			 * It might be (Expr op Expr), which goes here thanks to the
+			 * optimization at the beginning of clauselist_selectivity.
+			 * So try applying extended stats first, then fall back to
+			 * restriction_selectivity.
+			 *
+			 * XXX Kinda does the same thing as clauselist_selectivity, but
+			 * for a single clause. Maybe we could call that, but need to
+			 * be careful not to cause infinite loop.
+			 */
+			bool	estimated = false;
+
+			if (use_extended_stats)
+			{
+				Bitmapset *estimatedclauses = NULL;
+				List *clauses = list_make1(src);
+				RelOptInfo *rel = find_single_rel_for_clauses(root, clauses);
+
+				if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+				{
+					/*
+					 * Estimate as many clauses as possible using extended statistics.
+					 *
+					 * 'estimatedclauses' is populated with the 0-based list position
+					 * index of clauses estimated here, and that should be ignored below.
+					 */
+					s1 = statext_clauselist_selectivity(root, clauses, varRelid,
+														jointype, sjinfo, rel,
+														&estimatedclauses, false);
+
+					estimated = (bms_num_members(estimatedclauses) == 1);
+				}
+			}
+
+			/* Estimate selectivity for a restriction clause (fallback). */
+			if (!estimated)
+				s1 = restriction_selectivity(root, opno,
+											 opclause->args,
+											 opclause->inputcollid,
+											 varRelid);
 		}
 
 		/*
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index b05e818ba9..59a9a64b09 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1352,14 +1352,15 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 	{
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		OpExpr	   *expr = (OpExpr *) clause;
-		Node	   *clause_expr;
+		Node	   *clause_expr,
+				   *clause_expr2;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
 		/* Check if the expression has the right shape */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_expr, &clause_expr2, NULL, NULL))
 			return false;
 
 		/*
@@ -1399,13 +1400,44 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			!get_func_leakproof(get_opcode(expr->opno)))
 			return false;
 
+		/* clause_expr is always valid */
+		Assert(clause_expr);
+
 		/* Check (Var op Const) or (Const op Var) clauses by recursing. */
 		if (IsA(clause_expr, Var))
-			return statext_is_compatible_clause_internal(root, clause_expr,
-														 relid, attnums, exprs);
+		{
+			if (!statext_is_compatible_clause_internal(root, clause_expr,
+													   relid, attnums, exprs))
+				return false;
+		}
+		else
+		{
+			/* Otherwise we have (Expr op Const) or (Const op Expr) or (Expr op Expr). */
+			*exprs = lappend(*exprs, clause_expr);
+		}
+
+		/* without second expression, it has to be Expr op Const, so we're done */
+		if (!clause_expr2)
+			return true;
+
+		/*
+		 * Do the same thing as for the first expression.
+		 *
+		 * XXX Need to be careful later about matching both expressions at
+		 * the same time.
+		 */
+		if (IsA(clause_expr2, Var))
+		{
+			if (!statext_is_compatible_clause_internal(root, clause_expr2,
+													   relid, attnums, exprs))
+				return false;
+		}
+		else
+		{
+			/* Otherwise we have (Expr op Const) or (Const op Expr). */
+			*exprs = lappend(*exprs, clause_expr2);
+		}
 
-		/* Otherwise we have (Expr op Const) or (Const op Expr). */
-		*exprs = lappend(*exprs, clause_expr);
 		return true;
 	}
 
@@ -1421,7 +1453,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL, NULL))
 			return false;
 
 		/*
@@ -2010,19 +2042,20 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
  *		Split an operator expression's arguments into Expr and Const parts.
  *
  * Attempts to match the arguments to either (Expr op Const) or (Const op
- * Expr), possibly with a RelabelType on top. When the expression matches this
- * form, returns true, otherwise returns false.
+ * Expr) or (Expr op Expr), possibly with a RelabelType on top. When the
+ * expression matches this form, returns true, otherwise returns false.
  *
  * Optionally returns pointers to the extracted Expr/Const nodes, when passed
  * non-null pointers (exprp, cstp and expronleftp). The expronleftp flag
  * specifies on which side of the operator we found the expression node.
  */
 bool
-examine_opclause_args(List *args, Node **exprp, Const **cstp,
+examine_opclause_args(List *args, Node **expr1p, Node **expr2p, Const **cstp,
 					  bool *expronleftp)
 {
-	Node	   *expr;
-	Const	   *cst;
+	Node	   *expr1 = NULL;
+	Node	   *expr2 = NULL;
+	Const	   *cst = NULL;
 	bool		expronleft;
 	Node	   *leftop,
 			   *rightop;
@@ -2042,22 +2075,36 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 
 	if (IsA(rightop, Const))
 	{
-		expr = (Node *) leftop;
+		expr1 = (Node *) leftop;
 		cst = (Const *) rightop;
 		expronleft = true;
 	}
 	else if (IsA(leftop, Const))
 	{
-		expr = (Node *) rightop;
+		expr1 = (Node *) rightop;
 		cst = (Const *) leftop;
 		expronleft = false;
 	}
 	else
-		return false;
+	{
+		expr1 = (Node *) leftop;
+		expr2 = (Node *) rightop;
+		expronleft = false;
+
+		/*
+		 * FIXME Both variables have to be for the same relation (otherwise
+		 * it's a join clause, and we don't deal with those yet. Need to
+		 * call pull_varnos on both sides or something like that.
+		 */
+		
+	}
 
 	/* return pointers to the extracted parts if requested */
-	if (exprp)
-		*exprp = expr;
+	if (expr1p)
+		*expr1p = expr1;
+
+	if (expr2p)
+		*expr2p = expr2;
 
 	if (cstp)
 		*cstp = cst;
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index ef118952c7..59538f804e 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1645,78 +1645,153 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			Node	   *clause_expr2;
 			Const	   *cst;
 			bool		expronleft;
-			int			idx;
 			Oid			collid;
 
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_expr, &clause_expr2, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
-			/* match the attribute/expression to a dimension of the statistic */
-			idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
-
-			Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
-
-			/*
-			 * Walk through the MCV items and evaluate the current clause. We
-			 * can skip items that were already ruled out, and terminate if
-			 * there are no remaining MCV items that might possibly match.
-			 */
-			for (i = 0; i < mcvlist->nitems; i++)
+			if (cst)	/* Expr op Const */
 			{
-				bool		match = true;
-				MCVItem    *item = &mcvlist->items[i];
+				int idx;
 
-				Assert(idx >= 0);
+				/* match the attribute/expression to a dimension of the statistic */
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * When the MCV item or the Const value is NULL we can treat
-				 * this as a mismatch. We must not call the operator because
-				 * of strictness.
+				 * Walk through the MCV items and evaluate the current clause. We
+				 * can skip items that were already ruled out, and terminate if
+				 * there are no remaining MCV items that might possibly match.
 				 */
-				if (item->isnull[idx] || cst->constisnull)
+				for (i = 0; i < mcvlist->nitems; i++)
 				{
-					matches[i] = RESULT_MERGE(matches[i], is_or, false);
-					continue;
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					Assert(idx >= 0);
+
+					/*
+					 * When the MCV item or the Const value is NULL we can treat
+					 * this as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || cst->constisnull)
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap. Once
+					 * the value gets false for AND-lists, or true for OR-lists,
+					 * we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * First check whether the constant is below the lower
+					 * boundary (in that case we can skip the bucket, because
+					 * there's no overlap).
+					 *
+					 * We don't store collations used to build the statistics, but
+					 * we can use the collation for the attribute itself, as
+					 * stored in varcollid. We do reset the statistics after a
+					 * type change (including collation change), so this is OK.
+					 * For expressions, we use the collation extracted from the
+					 * expression itself.
+					 */
+					if (expronleft)
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   item->values[idx],
+															   cst->constvalue));
+					else
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   cst->constvalue,
+															   item->values[idx]));
+
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
 				}
+			}
+			else	/* Expr op Expr */
+			{
+				int			idx;
+				int			idx2;
+
+				Assert(clause_expr2);
+				Assert(!expronleft);
 
 				/*
-				 * Skip MCV items that can't change result in the bitmap. Once
-				 * the value gets false for AND-lists, or true for OR-lists,
-				 * we don't need to look at more clauses.
+				 * Match the expressions to a dimension of the statistic.
+				 *
+				 * XXX Can the collations differ?
 				 */
-				if (RESULT_IS_FINAL(matches[i], is_or))
-					continue;
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+				idx2 = mcv_match_expression(clause_expr2, keys, exprs, &collid);
+
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert((idx2 >= 0) && (idx2 < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * First check whether the constant is below the lower
-				 * boundary (in that case we can skip the bucket, because
-				 * there's no overlap).
-				 *
-				 * We don't store collations used to build the statistics, but
-				 * we can use the collation for the attribute itself, as
-				 * stored in varcollid. We do reset the statistics after a
-				 * type change (including collation change), so this is OK.
-				 * For expressions, we use the collation extracted from the
-				 * expression itself.
+				 * Walk through the MCV items and evaluate the current clause.
+				 * We can skip items that were already ruled out, and
+				 * terminate if there are no remaining MCV items that might
+				 * possibly match.
 				 */
-				if (expronleft)
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					/*
+					 * When either of the MCV items is NULL we can treat this
+					 * as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || item->isnull[idx2])
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap.
+					 * Once the value gets false for AND-lists, or true for
+					 * OR-lists, we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * First check whether the constant is below the lower
+					 * boundary (in that case we can skip the bucket, because
+					 * there's no overlap).
+					 *
+					 * We don't store collations used to build the statistics,
+					 * but we can use the collation for the attribute itself,
+					 * as stored in varcollid. We do reset the statistics after
+					 * a type change (including collation change), so this is
+					 * OK. We may need to relax this after allowing extended
+					 * statistics on expressions.
+					 */
 					match = DatumGetBool(FunctionCall2Coll(&opproc,
 														   collid,
 														   item->values[idx],
-														   cst->constvalue));
-				else
-					match = DatumGetBool(FunctionCall2Coll(&opproc,
-														   collid,
-														   cst->constvalue,
-														   item->values[idx]));
+														   item->values[idx2]));
 
-				/* update the match bitmap with the result */
-				matches[i] = RESULT_MERGE(matches[i], is_or, match);
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
+				}
 			}
 		}
 		else if (IsA(clause, ScalarArrayOpExpr))
@@ -1743,7 +1818,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_expr, NULL, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
 			/* ScalarArrayOpExpr has the Var always on the left */
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 55cd9252a5..52b842bbe5 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -99,7 +99,7 @@ extern SortItem *build_sorted_items(StatsBuildData *data, int *nitems,
 									MultiSortSupport mss,
 									int numattrs, AttrNumber *attnums);
 
-extern bool examine_opclause_args(List *args, Node **exprp,
+extern bool examine_opclause_args(List *args, Node **expr1p, Node **expr2p,
 								  Const **cstp, bool *expronleftp);
 
 extern Selectivity mcv_combine_selectivities(Selectivity simple_sel,
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 8c214d8dfc..84b93299de 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1900,6 +1900,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        343 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      1667 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2047,6 +2059,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        200 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      3750 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2456,6 +2480,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       3750 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+        25 |   2500
+(1 row)
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 ANALYZE mcv_lists;
@@ -2489,6 +2519,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       2500 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -2583,6 +2619,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
       1094 |      0
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      9950 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 ANALYZE mcv_lists_bool;
@@ -2738,6 +2786,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 (1 row)
 
 DROP TABLE mcv_lists_partial;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -2797,6 +2857,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       2649 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+         1 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
 CREATE STATISTICS mcv_lists_multi_2 (mcv) ON c, d FROM mcv_lists_multi;
@@ -2843,6 +2921,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       1571 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
 DROP TABLE mcv_lists_multi;
 -- statistics on integer expressions
 CREATE TABLE expr_stats (a int, b int, c int);
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index e033080d4f..3ace3faa09 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -946,6 +946,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -999,6 +1003,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -1198,6 +1206,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 
@@ -1216,6 +1226,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -1291,6 +1303,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 
@@ -1376,6 +1392,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 
 DROP TABLE mcv_lists_partial;
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -1403,6 +1423,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
@@ -1417,6 +1440,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 DROP TABLE mcv_lists_multi;
 
-- 
2.31.1

#5Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#4)
Re: Use extended statistics to estimate (Var op Var) clauses

On Sun, Jun 13, 2021 at 1:29 PM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Hi,

Here is a slightly updated version of the patch - rebased to current
master and fixing some minor issues to handle expressions (and not just
the Var nodes as before).

The changes needed to support (Expr op Expr) are mostly mechanical,
though I'm sure the code needs some cleanup. The main issue I ran into
is the special case clauselist_selectivity, which does

if (list_length(clauses) == 1)
return clause_selectivity_ext(...);

which applies to cases like "WHERE a < b" which can now be handled by
extended statistics, thanks to this patch. But clause_selectivity_ext
only used to call restriction_selectivity for these clauses, which does
not use extended statistics, of course.

I considered either getting rid of the special case, passing everything
through extended stats, including cases with a single clause. But that
ends up affecting e.g. OR clauses, so I tweaked clause_selectivity_ext a
bit, which seems like a better approach.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,

-           for (i = 0; i < mcvlist->nitems; i++)
+           if (cst)    /* Expr op Const */

It seems the Const op Expr is also covered by this if branch. Hence the
comment should include this case.

Cheers

#6Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tomas Vondra (#4)
Re: Use extended statistics to estimate (Var op Var) clauses

On Jun 13, 2021, at 1:28 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Here is a slightly updated version of the patch

Thanks for taking this up again!

Applying the new test cases from your patch, multiple estimates have gotten better. That looks good. I wrote a few extra test cases and saw no change, which is fine. I was looking for regressions where the estimates are now worse than before. Do you expect there to be any such cases?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Mark Dilger (#6)
Re: Use extended statistics to estimate (Var op Var) clauses

On 6/14/21 5:36 PM, Mark Dilger wrote:

On Jun 13, 2021, at 1:28 PM, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Here is a slightly updated version of the patch

Thanks for taking this up again!

Applying the new test cases from your patch, multiple estimates have
gotten better. That looks good. I wrote a few extra test cases and
saw no change, which is fine. I was looking for regressions where
the estimates are now worse than before. Do you expect there to be
any such cases?

Not really. These clauses could not be estimated before, we generally
used default estimates for them. So

WHERE a = b

would use 0.5%, while

WHERE a < b

would use 33%, and so on. OTOH it depends on the accuracy of the
extended statistics - particularly the MCV list (what fraction of the
data it covers, etc.).

So it's possible the default estimate is very accurate by chance, and
MCV list represents only a tiny fraction of the data. Then the new
estimate could we worse. Consider for example this:

create table t (a int, b int);
insert into t select 100, 100 from generate_series(1,5000) s(i);
insert into t select i, i+1 from generate_series(1,995000) s(i);

This has exactly 0.5% of rows with (a=b). Without extended stats it's
perfect:

explain analyze select * from t where a = b;

Seq Scan on t (cost=0.00..16925.00 rows=5000 width=8)
(actual time=0.064..159.928 rows=5000 loops=1)

while with statistics it gets worse:

create statistics s (mcv) on a, b from t;
analyze t;

Seq Scan on t (cost=0.00..16925.00 rows=9810 width=8)
(actual time=0.059..160.467 rows=5000 loops=1)

It's not terrible, although we could construct worse examples. But the
same issue applies to other clauses, not just to these new ones. And it
relies on the regular estimation producing better estimate by chance.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tomas Vondra (#4)
Re: Use extended statistics to estimate (Var op Var) clauses

On Sun, 13 Jun 2021 at 21:28, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Here is a slightly updated version of the patch

The main issue I ran into
is the special case clauselist_selectivity, which does

if (list_length(clauses) == 1)
return clause_selectivity_ext(...);

which applies to cases like "WHERE a < b" which can now be handled by
extended statistics, thanks to this patch. But clause_selectivity_ext
only used to call restriction_selectivity for these clauses, which does
not use extended statistics, of course.

I considered either getting rid of the special case, passing everything
through extended stats, including cases with a single clause. But that
ends up affecting e.g. OR clauses, so I tweaked clause_selectivity_ext a
bit, which seems like a better approach.

Yeah, I looked at this a few months ago, while looking at the other
extended stats stuff, and I came to the same conclusion that solving
this issue by tweaking clause_selectivity_ext() is the best approach.

I haven't looked at the patch in much detail yet, but I think the
basic approach looks OK.

There are a few comments that need updating, e.g.:
- In statext_is_compatible_clause_internal(), before the "if
(is_opclause(clause))" test.
- The description of the arguments for examine_opclause_args().

I wonder if "expronleftp" for examine_opclause_args() should be
"constonrightp", or some such -- as it stands it's being set to false
for an Expr Op Expr clause, which doesn't seem right because there
*is* an expression on the left.

Regards,
Dean

#9Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Dean Rasheed (#8)
Re: Use extended statistics to estimate (Var op Var) clauses

On Sun, 13 Jun 2021 at 21:28, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Here is a slightly updated version of the patch

Hi,

I have looked at this in some more detail, and it all looks pretty
good, other than some mostly cosmetic stuff.

The new code in statext_is_compatible_clause_internal() is a little
hard to follow because some of the comments aren't right (e.g. when
checking clause_expr2, it isn't an (Expr op Const) or (Const op Expr)
as the comment says). Rather than trying to comment on each
conditional branch, it might be simpler to just have a single
catch-all comment at the top, and also remove the "return true" in the
middle, to make it something like:

/*
* Check Vars appearing on either side by recursing, and make a note of
* any expressions.
*/
if (IsA(clause_expr, Var))
{
if (!statext_is_compatible_clause_internal(...))
return false;
}
else
*exprs = lappend(*exprs, clause_expr);

if (clause_expr2)
{
if (IsA(clause_expr2, Var))
{
if (!statext_is_compatible_clause_internal(...))
return false;
}
else
*exprs = lappend(*exprs, clause_expr2);
}

return true;

Is the FIXME comment in examine_opclause_args() necessary? The check
for a single relation has already been done in
clause[list]_selectivity_ext(), and I'm not sure what
examine_opclause_args() would do differently.

In mcv_get_match_bitmap(), perhaps do the RESULT_IS_FINAL() checks
first in each loop.

Also in mcv_get_match_bitmap(), the 2 "First check whether the
constant is below the lower boundary ..." comments don't make any
sense to me. Were those perhaps copied and pasted from somewhere else?
They should perhaps say "Otherwise, compare the MCVItem with the
constant" and "Otherwise compare the values from the MCVItem using the
clause operator", or something like that.

But other than such cosmetic things, I think the patch is good, and
gives some nice estimate improvements.

Regards,
Dean

#10Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Dean Rasheed (#9)
1 attachment(s)
Re: Use extended statistics to estimate (Var op Var) clauses

Hi,

On 7/5/21 2:46 PM, Dean Rasheed wrote:

On Sun, 13 Jun 2021 at 21:28, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Here is a slightly updated version of the patch

Hi,

I have looked at this in some more detail, and it all looks pretty
good, other than some mostly cosmetic stuff.

Thanks for the review!

The new code in statext_is_compatible_clause_internal() is a little
hard to follow because some of the comments aren't right (e.g. when
checking clause_expr2, it isn't an (Expr op Const) or (Const op Expr)
as the comment says). Rather than trying to comment on each
conditional branch, it might be simpler to just have a single
catch-all comment at the top, and also remove the "return true" in the
middle, to make it something like:

/*
* Check Vars appearing on either side by recursing, and make a note of
* any expressions.
*/
if (IsA(clause_expr, Var))
{
if (!statext_is_compatible_clause_internal(...))
return false;
}
else
*exprs = lappend(*exprs, clause_expr);

if (clause_expr2)
{
if (IsA(clause_expr2, Var))
{
if (!statext_is_compatible_clause_internal(...))
return false;
}
else
*exprs = lappend(*exprs, clause_expr2);
}

return true;

I ended up doing something slightly different - examine_opclause_args
now "returns" a list of expressions, instead of explicitly setting two
parameters. That means we can do a simple foreach() here, which seems
cleaner. It means we have to extract the expressions from the list in a
couple places, but that seems acceptable. Do you agree?

I also went through the comments and updated those that seemed wrong.

Is the FIXME comment in examine_opclause_args() necessary? The check
for a single relation has already been done in
clause[list]_selectivity_ext(), and I'm not sure what
examine_opclause_args() would do differently.

Yeah, I came to the same conclusion.

In mcv_get_match_bitmap(), perhaps do the RESULT_IS_FINAL() checks
first in each loop.

This is how master already does that now, and I wonder if it's done in
this order intentionally. It's not clear to me doing it in the other way
would be faster?

Also in mcv_get_match_bitmap(), the 2 "First check whether the
constant is below the lower boundary ..." comments don't make any
sense to me. Were those perhaps copied and pasted from somewhere else?
They should perhaps say "Otherwise, compare the MCVItem with the
constant" and "Otherwise compare the values from the MCVItem using the
clause operator", or something like that.

Yeah, that's another bit that comes from current master - the patch just
makes a new copy of the comment. I agree it's bogus, Seems like a
remainder of the original code which did various "smart" things we
removed over time. Will fix.

But other than such cosmetic things, I think the patch is good, and
gives some nice estimate improvements.

Thanks, sounds good. I guess the last thing is maybe mentioning this in
the docs, adding an example etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patchtext/x-patch; charset=UTF-8; name=0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patchDownload
From 788a909e09b372797c4b7b443e0e89c5d5181ec0 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Tue, 20 Jul 2021 20:15:13 +0200
Subject: [PATCH] Handling Expr op Expr clauses in extended stats

---
 src/backend/optimizer/path/clausesel.c        |  37 +++-
 src/backend/statistics/extended_stats.c       |  83 ++++++---
 src/backend/statistics/mcv.c                  | 172 +++++++++++++-----
 .../statistics/extended_stats_internal.h      |   4 +-
 src/test/regress/expected/stats_ext.out       |  96 ++++++++++
 src/test/regress/sql/stats_ext.sql            |  26 +++
 6 files changed, 341 insertions(+), 77 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d263ecf082..6a7e9ceea5 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -714,6 +714,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
 	bool		cacheable = false;
+	Node	   *src = clause;
 
 	if (clause == NULL)			/* can this still happen? */
 		return s1;
@@ -871,11 +872,37 @@ clause_selectivity_ext(PlannerInfo *root,
 		}
 		else
 		{
-			/* Estimate selectivity for a restriction clause. */
-			s1 = restriction_selectivity(root, opno,
-										 opclause->args,
-										 opclause->inputcollid,
-										 varRelid);
+			/*
+			 * It might be a single (Expr op Expr) clause, which goes here due
+			 * to the optimization at the beginning of clauselist_selectivity.
+			 * So we try applying extended stats first, and then fall back to
+			 * restriction_selectivity.
+			 */
+			bool	estimated = false;
+
+			if (use_extended_stats)
+			{
+				List	   *clauses = list_make1(src);
+				RelOptInfo *rel = find_single_rel_for_clauses(root, clauses);
+
+				if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+				{
+					Bitmapset  *estimatedclauses = NULL;
+
+					s1 = statext_clauselist_selectivity(root, clauses, varRelid,
+														jointype, sjinfo, rel,
+														&estimatedclauses, false);
+
+					estimated = (bms_num_members(estimatedclauses) == 1);
+				}
+			}
+
+			/* Estimate selectivity for a restriction clause (fallback). */
+			if (!estimated)
+				s1 = restriction_selectivity(root, opno,
+											 opclause->args,
+											 opclause->inputcollid,
+											 varRelid);
 		}
 
 		/*
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 2e55913bc8..606cf8c588 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1347,19 +1347,27 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		return true;
 	}
 
-	/* (Var/Expr op Const) or (Const op Var/Expr) */
+	/*
+	 * Three opclause variants are supported: (Expr op Const), (Const op Expr),
+	 * (Expr op Expr). That means we may need to analyze one or two expressions
+	 * to make sure the opclause is compatible with extended stats.
+	 */
 	if (is_opclause(clause))
 	{
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		OpExpr	   *expr = (OpExpr *) clause;
-		Node	   *clause_expr;
+		ListCell   *lc;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
-		/* Check if the expression has the right shape */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		/*
+		 * Check if the expression has the right shape. This returns either one
+		 * or two expressions, depending on whether there is a Const.
+		 */
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
 		/*
@@ -1399,13 +1407,31 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			!get_func_leakproof(get_opcode(expr->opno)))
 			return false;
 
-		/* Check (Var op Const) or (Const op Var) clauses by recursing. */
-		if (IsA(clause_expr, Var))
-			return statext_is_compatible_clause_internal(root, clause_expr,
-														 relid, attnums, exprs);
+		/*
+		 * There's always at least one expression, otherwise the clause would
+		 * not be considered compatible.
+		 */
+		Assert(list_length(clause_exprs) >= 1);
+
+		/*
+		 * Check all expressions by recursing. Var expressions are handled as
+		 * a special case (to match it to attnums etc.)
+		 */
+		foreach (lc, clause_exprs)
+		{
+			Node *clause_expr = (Node *) lfirst(lc);
+
+			if (IsA(clause_expr, Var))
+			{
+				/* if the Var is incompatible, the whole clause is incompatible */
+				if (!statext_is_compatible_clause_internal(root, clause_expr,
+														   relid, attnums, exprs))
+					return false;
+			}
+			else	/* generic expression */
+				*exprs = lappend(*exprs, clause_expr);
+		}
 
-		/* Otherwise we have (Expr op Const) or (Const op Expr). */
-		*exprs = lappend(*exprs, clause_expr);
 		return true;
 	}
 
@@ -1415,15 +1441,21 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) clause;
 		Node	   *clause_expr;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
+		/* There has to be one expression exactly. */
+		Assert(list_length(clause_exprs) == 1);
+
+		clause_expr = (Node *) linitial(clause_exprs);
+
 		/*
 		 * If it's not one of the supported operators ("=", "<", ">", etc.),
 		 * just ignore the clause, as it's not compatible with MCV lists.
@@ -2009,20 +2041,19 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
  * examine_opclause_args
  *		Split an operator expression's arguments into Expr and Const parts.
  *
- * Attempts to match the arguments to either (Expr op Const) or (Const op
- * Expr), possibly with a RelabelType on top. When the expression matches this
- * form, returns true, otherwise returns false.
+ * Attempts to match the arguments to either (Expr op Const) or (Const op Expr)
+ * or (Expr op Expr), possibly with a RelabelType on top. When the expression
+ * matches this form, returns true, otherwise returns false.
  *
  * Optionally returns pointers to the extracted Expr/Const nodes, when passed
- * non-null pointers (exprp, cstp and expronleftp). The expronleftp flag
+ * non-null pointers (exprsp, cstp and expronleftp). The expronleftp flag
  * specifies on which side of the operator we found the expression node.
  */
 bool
-examine_opclause_args(List *args, Node **exprp, Const **cstp,
-					  bool *expronleftp)
+examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp)
 {
-	Node	   *expr;
-	Const	   *cst;
+	List	   *exprs = NIL;
+	Const	   *cst = NULL;
 	bool		expronleft;
 	Node	   *leftop,
 			   *rightop;
@@ -2042,22 +2073,26 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 
 	if (IsA(rightop, Const))
 	{
-		expr = (Node *) leftop;
+		exprs = lappend(exprs, leftop);
 		cst = (Const *) rightop;
 		expronleft = true;
 	}
 	else if (IsA(leftop, Const))
 	{
-		expr = (Node *) rightop;
+		exprs = lappend(exprs, rightop);
 		cst = (Const *) leftop;
 		expronleft = false;
 	}
 	else
-		return false;
+	{
+		exprs = lappend(exprs, leftop);
+		exprs = lappend(exprs, rightop);
+		expronleft = false;
+	}
 
 	/* return pointers to the extracted parts if requested */
-	if (exprp)
-		*exprp = expr;
+	if (exprsp)
+		*exprsp = exprs;
 
 	if (cstp)
 		*cstp = cst;
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index ef118952c7..85f650f572 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1645,78 +1645,154 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			Node	   *clause_expr2;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
-			int			idx;
 			Oid			collid;
 
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
-			/* match the attribute/expression to a dimension of the statistic */
-			idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+			if (cst)	/* Expr op Const */
+			{
+				int idx;
 
-			Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert(list_length(clause_exprs) == 1);
+				clause_expr = (Node *) linitial(clause_exprs);
 
-			/*
-			 * Walk through the MCV items and evaluate the current clause. We
-			 * can skip items that were already ruled out, and terminate if
-			 * there are no remaining MCV items that might possibly match.
-			 */
-			for (i = 0; i < mcvlist->nitems; i++)
-			{
-				bool		match = true;
-				MCVItem    *item = &mcvlist->items[i];
+				/* match the attribute/expression to a dimension of the statistic */
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
 
-				Assert(idx >= 0);
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * When the MCV item or the Const value is NULL we can treat
-				 * this as a mismatch. We must not call the operator because
-				 * of strictness.
+				 * Walk through the MCV items and evaluate the current clause. We
+				 * can skip items that were already ruled out, and terminate if
+				 * there are no remaining MCV items that might possibly match.
 				 */
-				if (item->isnull[idx] || cst->constisnull)
+				for (i = 0; i < mcvlist->nitems; i++)
 				{
-					matches[i] = RESULT_MERGE(matches[i], is_or, false);
-					continue;
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					Assert(idx >= 0);
+
+					/*
+					 * When the MCV item or the Const value is NULL we can treat
+					 * this as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || cst->constisnull)
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap. Once
+					 * the value gets false for AND-lists, or true for OR-lists,
+					 * we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics, but
+					 * we can use the collation for the attribute itself, as
+					 * stored in varcollid. We do reset the statistics after a
+					 * type change (including collation change), so this is OK.
+					 * For expressions, we use the collation extracted from the
+					 * expression itself.
+					 */
+					if (expronleft)
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   item->values[idx],
+															   cst->constvalue));
+					else
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   cst->constvalue,
+															   item->values[idx]));
+
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
 				}
+			}
+			else	/* Expr op Expr */
+			{
+				int			idx;
+				int			idx2;
+
+				Assert(list_length(clause_exprs) == 2);
+
+				clause_expr = (Node *) linitial(clause_exprs);
+				clause_expr2 = (Node *) lsecond(clause_exprs);
+
+				Assert(clause_expr2);
+				Assert(!expronleft);
 
 				/*
-				 * Skip MCV items that can't change result in the bitmap. Once
-				 * the value gets false for AND-lists, or true for OR-lists,
-				 * we don't need to look at more clauses.
+				 * Match the expressions to a dimension of the statistic.
+				 *
+				 * XXX Can the collations differ?
 				 */
-				if (RESULT_IS_FINAL(matches[i], is_or))
-					continue;
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+				idx2 = mcv_match_expression(clause_expr2, keys, exprs, &collid);
+
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert((idx2 >= 0) && (idx2 < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * First check whether the constant is below the lower
-				 * boundary (in that case we can skip the bucket, because
-				 * there's no overlap).
-				 *
-				 * We don't store collations used to build the statistics, but
-				 * we can use the collation for the attribute itself, as
-				 * stored in varcollid. We do reset the statistics after a
-				 * type change (including collation change), so this is OK.
-				 * For expressions, we use the collation extracted from the
-				 * expression itself.
+				 * Walk through the MCV items and evaluate the current clause.
+				 * We can skip items that were already ruled out, and
+				 * terminate if there are no remaining MCV items that might
+				 * possibly match.
 				 */
-				if (expronleft)
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					/*
+					 * When either of the MCV items is NULL we can treat this
+					 * as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || item->isnull[idx2])
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap.
+					 * Once the value gets false for AND-lists, or true for
+					 * OR-lists, we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics,
+					 * but we can use the collation for the attribute itself,
+					 * as stored in varcollid. We do reset the statistics after
+					 * a type change (including collation change), so this is
+					 * OK. We may need to relax this after allowing extended
+					 * statistics on expressions.
+					 */
 					match = DatumGetBool(FunctionCall2Coll(&opproc,
 														   collid,
 														   item->values[idx],
-														   cst->constvalue));
-				else
-					match = DatumGetBool(FunctionCall2Coll(&opproc,
-														   collid,
-														   cst->constvalue,
-														   item->values[idx]));
+														   item->values[idx2]));
 
-				/* update the match bitmap with the result */
-				matches[i] = RESULT_MERGE(matches[i], is_or, match);
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
+				}
 			}
 		}
 		else if (IsA(clause, ScalarArrayOpExpr))
@@ -1726,6 +1802,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
 			Oid			collid;
@@ -1743,11 +1820,14 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
 			/* ScalarArrayOpExpr has the Var always on the left */
 			Assert(expronleft);
+			Assert(list_length(clause_exprs) == 1);
+
+			clause_expr = (Node *) linitial(clause_exprs);
 
 			/* XXX what if (cst->constisnull == NULL)? */
 			if (!cst->constisnull)
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 55cd9252a5..1f30fa9060 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -99,8 +99,8 @@ extern SortItem *build_sorted_items(StatsBuildData *data, int *nitems,
 									MultiSortSupport mss,
 									int numattrs, AttrNumber *attnums);
 
-extern bool examine_opclause_args(List *args, Node **exprp,
-								  Const **cstp, bool *expronleftp);
+extern bool examine_opclause_args(List *args, List **exprs, Const **cstp,
+								  bool *expronleftp);
 
 extern Selectivity mcv_combine_selectivities(Selectivity simple_sel,
 											 Selectivity mcv_sel,
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 62b05c79f9..93e4edbf3e 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1904,6 +1904,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        343 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      1667 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2051,6 +2063,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        200 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      3750 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2460,6 +2484,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       3750 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+        25 |   2500
+(1 row)
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 ANALYZE mcv_lists;
@@ -2493,6 +2523,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       2500 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -2587,6 +2623,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
       1094 |      0
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      9950 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 ANALYZE mcv_lists_bool;
@@ -2742,6 +2790,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 (1 row)
 
 DROP TABLE mcv_lists_partial;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -2801,6 +2861,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       2649 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+         1 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
 CREATE STATISTICS mcv_lists_multi_2 (mcv) ON c, d FROM mcv_lists_multi;
@@ -2847,6 +2925,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       1571 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
 DROP TABLE mcv_lists_multi;
 -- statistics on integer expressions
 CREATE TABLE expr_stats (a int, b int, c int);
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index e033080d4f..3ace3faa09 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -946,6 +946,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -999,6 +1003,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -1198,6 +1206,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 
@@ -1216,6 +1226,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -1291,6 +1303,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 
@@ -1376,6 +1392,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 
 DROP TABLE mcv_lists_partial;
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -1403,6 +1423,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
@@ -1417,6 +1440,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 DROP TABLE mcv_lists_multi;
 
-- 
2.31.1

#11Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tomas Vondra (#10)
Re: Use extended statistics to estimate (Var op Var) clauses

On Tue, 20 Jul 2021 at 19:28, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

The new code in statext_is_compatible_clause_internal() is a little
hard to follow because some of the comments aren't right

I ended up doing something slightly different - examine_opclause_args
now "returns" a list of expressions, instead of explicitly setting two
parameters. That means we can do a simple foreach() here, which seems
cleaner. It means we have to extract the expressions from the list in a
couple places, but that seems acceptable. Do you agree?

Yes, that looks much neater.

In mcv_get_match_bitmap(), perhaps do the RESULT_IS_FINAL() checks
first in each loop.

This is how master already does that now, and I wonder if it's done in
this order intentionally. It's not clear to me doing it in the other way
would be faster?

Ah OK, it just felt more natural to do it the other way round. I
suppose though, that for the first clause, the is-final check isn't
going to catch anything, whereas the is-null checks might. For the
remaining clauses, it will depend on the data as to which way is
faster, but it probably isn't going to make any noticeable difference
either way. So, although it initially seems a bit counter-intuitive,
it's probably better the way it is.

I guess the last thing is maybe mentioning this in
the docs, adding an example etc.

Yeah, good idea.

Regards,
Dean

#12Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tomas Vondra (#10)
2 attachment(s)
Re: Use extended statistics to estimate (Var op Var) clauses

On Jul 20, 2021, at 11:28 AM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
<0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch>

Hi Tomas,

I tested this patch against master looking for types of clauses that uniformly get worse with the patch applied. I found some.

The tests are too large to attach, but the scripts that generate them are not. To perform the tests:

git checkout master
perl ./gentest.pl > src/test/regress/sql/gentest.sql
cat /dev/null > src/test/regress/expected/gentest.out
echo "test: gentest" >> src/test/regress/parallel_schedule
./configure && make && make check
cp src/test/regress/results/gentest.out src/test/regress/expected/gentest.out
patch -p 1 < 0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch
make check
cat src/test/regress/regression.diffs | perl ./check.pl

This shows patterns of conditions that get worse, such as:

better:0, worse:80: A < B and A <> A or not A < A
better:0, worse:80: A < B and not A <= A or A <= A
better:0, worse:80: A < B or A = A
better:0, worse:80: A < B or A = A or not A >= A
better:0, worse:80: A < B or A >= A
better:0, worse:80: A < B or A >= A and not A <> A
better:0, worse:80: A < B or not A < A
better:0, worse:80: A < B or not A <> A
better:0, worse:80: A < B or not A <> A or A <= A
better:0, worse:80: A < B or not A >= A or not A < A

It seems things get worse when the conditions contain a column compared against itself. I suspect that is being handled incorrectly.

Attachments:

check.pltext/x-perl-script; name=check.pl; x-unix-mode=0755Download
gentest.pltext/x-perl-script; name=gentest.pl; x-unix-mode=0755Download
#13Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Mark Dilger (#12)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/9/21 9:19 PM, Mark Dilger wrote:

On Jul 20, 2021, at 11:28 AM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
<0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch>

Hi Tomas,

I tested this patch against master looking for types of clauses that uniformly get worse with the patch applied. I found some.

The tests are too large to attach, but the scripts that generate them are not. To perform the tests:

git checkout master
perl ./gentest.pl > src/test/regress/sql/gentest.sql
cat /dev/null > src/test/regress/expected/gentest.out
echo "test: gentest" >> src/test/regress/parallel_schedule
./configure && make && make check
cp src/test/regress/results/gentest.out src/test/regress/expected/gentest.out
patch -p 1 < 0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch
make check
cat src/test/regress/regression.diffs | perl ./check.pl

This shows patterns of conditions that get worse, such as:

better:0, worse:80: A < B and A <> A or not A < A
better:0, worse:80: A < B and not A <= A or A <= A
better:0, worse:80: A < B or A = A
better:0, worse:80: A < B or A = A or not A >= A
better:0, worse:80: A < B or A >= A
better:0, worse:80: A < B or A >= A and not A <> A
better:0, worse:80: A < B or not A < A
better:0, worse:80: A < B or not A <> A
better:0, worse:80: A < B or not A <> A or A <= A
better:0, worse:80: A < B or not A >= A or not A < A

It seems things get worse when the conditions contain a column compared against itself. I suspect that is being handled incorrectly.

Thanks for this testing!

I took a quick look, and I think this is mostly due to luck in how the
(default) range estimates combine without and with extended statistics.
Consider for example this simple example:

create table t (a int, b int);

insert into t select mod(i,10), mod(i,20)
from generate_series(1,1000000) s(i);

Without stats, the first clauses example is estimated like this:

explain (timing off, analyze) select * from t
where (A < B and A <> A) or not A < A;

QUERY PLAN
----------------------------------------------------------
Seq Scan on t (cost=0.00..21925.00 rows=554444 width=8)
(actual rows=1000000 loops=1)
Filter: (((a < b) AND (a <> a)) OR (a >= a))
Planning Time: 0.054 ms
Execution Time: 80.485 ms
(4 rows)

and with MCV on (a,b) it gets estimates like this:

QUERY PLAN

----------------------------------------------------------
Seq Scan on t (cost=0.00..21925.00 rows=333333 width=8)
(actual rows=1000000 loops=1)
Filter: (((a < b) AND (a <> a)) OR (a >= a))
Planning Time: 0.152 ms
Execution Time: 79.917 ms
(4 rows)

So with the statistics, the estimate gets a bit worse. The reason is
fairly simple - if you look at the two parts of the OR clause, we get this:

clause actual no stats with stats
---------------------------------------------------------------
(A < B and A <> A) 0 331667 1
not (A < A) 1000000 333333 333333

This clearly shows that the first clause is clearly improved, while the
(A < A) is estimated the same way, because the clause has a single Var
so it's considered to be "simple" so we ignore the MCV selectivity and
just use the simple_sel calculated by clause_selectivity_ext.

And the 333333 and 331667 just happen to be closer to the actual row
count. But that's mostly by luck, clearly.

But now that I think about it, maybe the problem really is in how
statext_mcv_clauselist_selectivity treats this clause - the definition
of "simple" clauses as "has one attnum" was appropriate when only
clauses (Var op Const) were supported. But with (Var op Var) that's
probably not correct anymore.

And indeed, commenting out the if condition on line 1933 (so ignoring
simple_sel) and that does improve the estimates for this query. But
perhaps I'm missing something, this needs more thought.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tomas Vondra (#13)
Re: Use extended statistics to estimate (Var op Var) clauses

On Wed, 11 Aug 2021 at 00:05, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

So with the statistics, the estimate gets a bit worse. The reason is
fairly simple - if you look at the two parts of the OR clause, we get this:

clause actual no stats with stats
---------------------------------------------------------------
(A < B and A <> A) 0 331667 1
not (A < A) 1000000 333333 333333

This clearly shows that the first clause is clearly improved, while the
(A < A) is estimated the same way, because the clause has a single Var
so it's considered to be "simple" so we ignore the MCV selectivity and
just use the simple_sel calculated by clause_selectivity_ext.

And the 333333 and 331667 just happen to be closer to the actual row
count. But that's mostly by luck, clearly.

But now that I think about it, maybe the problem really is in how
statext_mcv_clauselist_selectivity treats this clause - the definition
of "simple" clauses as "has one attnum" was appropriate when only
clauses (Var op Const) were supported. But with (Var op Var) that's
probably not correct anymore.

Hmm, interesting. Clearly the fact that the combined estimate without
extended stats was better was just luck, based on it's large
overestimate of the first clause. But it's also true that a (Var op
Var) clause should not be treated as simple, because "simple" in this
context is meant to be for clauses that are likely to be better
estimated with regular stats, whereas in this case, extended stats
would almost certainly do better on the second clause.

Perhaps the easiest way to identify simple clauses would be in
statext_is_compatible_clause(), rather than the way it's done now,
because it has the relevant information at hand, so it could be made
to return an extra flag.

This feels like rather an artificial example though. Is there any real
use for this sort of clause?

Regards,
Dean

#15Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Dean Rasheed (#14)
Re: Use extended statistics to estimate (Var op Var) clauses

On Aug 11, 2021, at 5:08 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

This feels like rather an artificial example though. Is there any real
use for this sort of clause?

The test generated random combinations of clauses and then checked if any had consistently worse performance. These came up. I don't know that they represent anything real.

What was not random in the tests was the data in the tables. I've gotten curious if these types of clauses (with columns compared against themselves) would still be bad for random rather than orderly data sets. I'll go check....

testing....

Wow. Randomizing the data makes the problems even more extreme. It seems my original test set was actually playing to this patch's strengths, not its weaknesses. I've changed the columns to double precision and filled the columns with random() data, where column1 gets random()^1, column2 gets random()^2, etc. So on average the larger numbered columns will be smaller, and the mcv list will be irrelevant, since values should not tend to repeat.

Over all queries, 47791 have better estimates after the patch, but 34802 had worse estimates after the patch (with the remaining 17407 queries having roughly equal quality).

The worst estimates are still ones that have a column compared to itself:

better:0, worse:33: A <= B or A <= A or A <= A
better:0, worse:33: A <= B or A = A or not A <> A
better:0, worse:33: A <= B or A >= A or not A <> A
better:0, worse:33: A <> B or A <= A
better:0, worse:33: A <> B or A <= A or A <> A
better:0, worse:33: A <> B or A <= A or A >= A
better:0, worse:33: A <> B or A <= A or not A = A
better:0, worse:33: A <> B or A > A or not A < A
better:0, worse:33: A <> B or A >= A
better:0, worse:33: A <> B or A >= A and A <= A
better:0, worse:33: A = B or not A > A or not A > A
better:0, worse:33: A >= B or not A <> A or A = A
better:0, worse:39: B <= A or B <= B or B <= B
better:0, worse:39: B <= A or B = B or not B <> B
better:0, worse:39: B <= A or B >= B or not B <> B
better:0, worse:39: B <> A or B <= B
better:0, worse:39: B <> A or B <= B or B <> B
better:0, worse:39: B <> A or B <= B or B >= B
better:0, worse:39: B <> A or B <= B or not B = B
better:0, worse:39: B <> A or B > B or not B < B
better:0, worse:39: B <> A or B >= B
better:0, worse:39: B <> A or B >= B and B <= B
better:0, worse:39: B = A or not B > B or not B > B
better:0, worse:39: B >= A or not B <> B or B = B

But there are plenty that got worse without that, such as the following examples:

better:25, worse:39: A < B and A < B or B > A
better:10, worse:48: A < B and A < C
better:10, worse:54: A < B and A < C or C > A

I'll go test random data designed to have mcv lists of significance....


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Dean Rasheed (#14)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/11/21 2:08 PM, Dean Rasheed wrote:

On Wed, 11 Aug 2021 at 00:05, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

So with the statistics, the estimate gets a bit worse. The reason is
fairly simple - if you look at the two parts of the OR clause, we get this:

clause actual no stats with stats
---------------------------------------------------------------
(A < B and A <> A) 0 331667 1
not (A < A) 1000000 333333 333333

This clearly shows that the first clause is clearly improved, while the
(A < A) is estimated the same way, because the clause has a single Var
so it's considered to be "simple" so we ignore the MCV selectivity and
just use the simple_sel calculated by clause_selectivity_ext.

And the 333333 and 331667 just happen to be closer to the actual row
count. But that's mostly by luck, clearly.

But now that I think about it, maybe the problem really is in how
statext_mcv_clauselist_selectivity treats this clause - the definition
of "simple" clauses as "has one attnum" was appropriate when only
clauses (Var op Const) were supported. But with (Var op Var) that's
probably not correct anymore.

Hmm, interesting. Clearly the fact that the combined estimate without
extended stats was better was just luck, based on it's large
overestimate of the first clause. But it's also true that a (Var op
Var) clause should not be treated as simple, because "simple" in this
context is meant to be for clauses that are likely to be better
estimated with regular stats, whereas in this case, extended stats
would almost certainly do better on the second clause.

I don't see why extended stats would do better on the second clause. I
mean, if you have (A < A) then extended stats pretty much "collapse"
into per-column stats. We could get almost the same estimate on
single-column MCV list, etc. The reason why that does not happen is that
we just treat it as a range clause, and assign it a default 33% estimate.

But we could make that a bit smarter, and assign better estimates to
those clauses

(A < A) => 0.0
(A = A) => 1.0
(A <= A) => 1.0

And that'd give us the same estimates, I think. Not sure that's worth
it, because (A op A) clauses are probably very rare, OTOH it's cheap.

Perhaps the easiest way to identify simple clauses would be in
statext_is_compatible_clause(), rather than the way it's done now,
because it has the relevant information at hand, so it could be made
to return an extra flag.

Agreed, that seems like a better place to fix this.

This feels like rather an artificial example though. Is there any real
use for this sort of clause?

True. It seems a bit artificial, which is understandable as it came from
a synthetic test generating all possible clauses. OTOH, fixing it seems
fairly cheap ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Mark Dilger (#15)
Re: Use extended statistics to estimate (Var op Var) clauses

On Aug 11, 2021, at 7:51 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

I'll go test random data designed to have mcv lists of significance....

Done. The data for column_i is set to floor(random()^i*20). column_1 therefore is evenly distributed between 0..19, with successive columns weighted more towards smaller values.

This still gives (marginally) worse results than the original test I posted, but better than the completely random data from the last post. After the patch, 72294 estimates got better and 30654 got worse. The biggest losers from this data set are:

better:0, worse:31: A >= B or A = A or not A = A
better:0, worse:31: A >= B or A = A
better:0, worse:31: A >= B or not A <> A
better:0, worse:31: A >= A or A = B or not B = A
better:0, worse:31: A >= B and not A < A or A = A
better:0, worse:31: A = A or not A > B or B <> A
better:0, worse:31: A >= B or not A <> A or not A >= A
better:0, worse:32: B < A and B > C and not C < B <----
better:1, worse:65: A <> C and A <= B <----
better:0, worse:33: B <> A or B >= B
better:0, worse:33: B <> A or B <= B
better:0, worse:33: B <= A or B = B or not B > B
better:0, worse:33: B <> A or not B >= B or not B < B
better:0, worse:33: B = A or not B > B or B = B
better:0, worse:44: A = B or not A > A or A = A
better:0, worse:44: A <> B or A <= A
better:0, worse:44: A <> B or not A >= A or not A < A
better:0, worse:44: A <= B or A = A or not A > A
better:0, worse:44: A <> B or A >= A

Of which, a few do not contain columns compared against themselves, marked with <---- above.

I don't really know what to make of these results. It doesn't bother me that any particular estimate gets worse after the patch. That's just the nature of estimating. But it does bother me a bit that some types of estimates consistently get worse. We should either show that my analysis is wrong about that, or find a way to address it to avoid performance regressions. If I'm right that there are whole classes of estimates that are made consistently worse, then it stands to reason some users will have those data distributions and queries, and could easily notice.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Mark Dilger (#15)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/11/21 4:51 PM, Mark Dilger wrote:

On Aug 11, 2021, at 5:08 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

This feels like rather an artificial example though. Is there any real
use for this sort of clause?

The test generated random combinations of clauses and then checked if any had consistently worse performance. These came up. I don't know that they represent anything real.

What was not random in the tests was the data in the tables. I've gotten curious if these types of clauses (with columns compared against themselves) would still be bad for random rather than orderly data sets. I'll go check....

testing....

Wow. Randomizing the data makes the problems even more extreme. It seems my original test set was actually playing to this patch's strengths, not its weaknesses. I've changed the columns to double precision and filled the columns with random() data, where column1 gets random()^1, column2 gets random()^2, etc. So on average the larger numbered columns will be smaller, and the mcv list will be irrelevant, since values should not tend to repeat.

Over all queries, 47791 have better estimates after the patch, but 34802 had worse estimates after the patch (with the remaining 17407 queries having roughly equal quality).

The worst estimates are still ones that have a column compared to itself:

better:0, worse:33: A <= B or A <= A or A <= A
better:0, worse:33: A <= B or A = A or not A <> A
better:0, worse:33: A <= B or A >= A or not A <> A
better:0, worse:33: A <> B or A <= A
better:0, worse:33: A <> B or A <= A or A <> A
better:0, worse:33: A <> B or A <= A or A >= A
better:0, worse:33: A <> B or A <= A or not A = A
better:0, worse:33: A <> B or A > A or not A < A
better:0, worse:33: A <> B or A >= A
better:0, worse:33: A <> B or A >= A and A <= A
better:0, worse:33: A = B or not A > A or not A > A
better:0, worse:33: A >= B or not A <> A or A = A
better:0, worse:39: B <= A or B <= B or B <= B
better:0, worse:39: B <= A or B = B or not B <> B
better:0, worse:39: B <= A or B >= B or not B <> B
better:0, worse:39: B <> A or B <= B
better:0, worse:39: B <> A or B <= B or B <> B
better:0, worse:39: B <> A or B <= B or B >= B
better:0, worse:39: B <> A or B <= B or not B = B
better:0, worse:39: B <> A or B > B or not B < B
better:0, worse:39: B <> A or B >= B
better:0, worse:39: B <> A or B >= B and B <= B
better:0, worse:39: B = A or not B > B or not B > B
better:0, worse:39: B >= A or not B <> B or B = B

The other interesting thing all those clauses have in common is that
they're OR clauses. And we handle that a bit differently. But I think
the "strange" clauses with the same Var on both sides is the main issue,
and not detecting them as "simple" clauses should fix that.

But there are plenty that got worse without that, such as the following examples:

better:25, worse:39: A < B and A < B or B > A
better:10, worse:48: A < B and A < C
better:10, worse:54: A < B and A < C or C > A

I'll go test random data designed to have mcv lists of significance....

Hard to say without having a look at the data set, but there'll always
be cases where the extended stats perform a bit worse, due to (a) luck
and (b) the stats covering only small fraction of the table.

But of course, it's worth investigating the suspicious cases.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Mark Dilger (#15)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/11/21 4:51 PM, Mark Dilger wrote:

On Aug 11, 2021, at 5:08 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

This feels like rather an artificial example though. Is there any real
use for this sort of clause?

The test generated random combinations of clauses and then checked if
any had consistently worse performance. These came up. I don't
know that they represent anything real.

What was not random in the tests was the data in the tables. I've
gotten curious if these types of clauses (with columns compared
against themselves) would still be bad for random rather than orderly
data sets. I'll go check.... >
testing....

Wow. Randomizing the data makes the problems even more extreme. It

seems my original test set was actually playing to this patch's
strengths, not its weaknesses. I've changed the columns to double
precision and filled the columns with random() data, where column1 gets
random()^1, column2 gets random()^2, etc. So on average the larger
numbered columns will be smaller, and the mcv list will be irrelevant,
since values should not tend to repeat.

I tried using the same randomized data set, i.e. essentially

create statistics s (mcv) on a, b, c from t;

insert into t

select random(), pow(random(), 2), pow(random(), 3), pow(random(),4)
from generate_series(1,1000000) s(i);

create statistics s (mcv) on a, b, c from t;

But I don't see any difference compared to the estimates without
extended statistics, which is not surprising because there should be no
MCV list built. So I'm a bit puzzled about the claim that random data
make the problems more extreme. Can you explain?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Mark Dilger (#17)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/11/21 5:17 PM, Mark Dilger wrote:

On Aug 11, 2021, at 7:51 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

I'll go test random data designed to have mcv lists of significance....

Done. The data for column_i is set to floor(random()^i*20).
column_1 therefore is evenly distributed between 0..19, with
successive columns weighted more towards smaller values.

This still gives (marginally) worse results than the original test I
posted, but better than the completely random data from the last post.
After the patch, 72294 estimates got better and 30654 got worse. The
biggest losers from this data set are:

better:0, worse:31: A >= B or A = A or not A = A
better:0, worse:31: A >= B or A = A
better:0, worse:31: A >= B or not A <> A
better:0, worse:31: A >= A or A = B or not B = A
better:0, worse:31: A >= B and not A < A or A = A
better:0, worse:31: A = A or not A > B or B <> A
better:0, worse:31: A >= B or not A <> A or not A >= A
better:0, worse:32: B < A and B > C and not C < B <----
better:1, worse:65: A <> C and A <= B <----
better:0, worse:33: B <> A or B >= B
better:0, worse:33: B <> A or B <= B
better:0, worse:33: B <= A or B = B or not B > B
better:0, worse:33: B <> A or not B >= B or not B < B
better:0, worse:33: B = A or not B > B or B = B
better:0, worse:44: A = B or not A > A or A = A
better:0, worse:44: A <> B or A <= A
better:0, worse:44: A <> B or not A >= A or not A < A
better:0, worse:44: A <= B or A = A or not A > A
better:0, worse:44: A <> B or A >= A

Of which, a few do not contain columns compared against themselves,
marked with <---- above.

I don't really know what to make of these results. It doesn't
bother me that any particular estimate gets worse after the patch.
That's just the nature of estimating. But it does bother me a bit
that some types of estimates consistently get worse. We should
either show that my analysis is wrong about that, or find a way to
address it to avoid performance regressions. If I'm right that there
are whole classes of estimates that are made consistently worse, then
it stands to reason some users will have those data distributions and
queries, and could easily notice.

I'm not quite sure that's really a problem. Extended statistics are
meant for correlated columns, and it's mostly expected the estimates may
be a bit worse for random / independent data. The idea is mostly that
statistics will be created only for correlated columns, in which case it
should improve the estimates. I'd be way more concerned if you observed
consistently worse estimates on such data set.

Of course, there may be errors - the incorrect handling of (A op A) is
an example of such issue, probably.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tomas Vondra (#19)
Re: Use extended statistics to estimate (Var op Var) clauses

On Aug 11, 2021, at 10:38 AM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

So I'm a bit puzzled about the claim that random data make the problems more extreme. Can you explain?

Hmm... you appear to be right.

I changed the gentest.pl script to fill the tables with randomized data, but the random data is being regenerated each test run (since the calls to random() are in the gentest.sql file). Adding an explicit setseed() call in the test to make sure the data is the same before and after applying your patch eliminates the differences.

So there are three tests here. The first tests deterministic orderly data. The second tests deterministic random data without repeats and hence without meaningful mvc. The third tests deterministic random data with rounding into twenty buckets skewed towards lower numbered buckets and hence with both repeats and meaningful mvc.

The original test set:

TOTAL:
better: 77827
worse: 12317

The random test set, with setseed() calls to make it deterministic:

TOTAL:
better: 49708
worse: 19393

The random test set , with setseed() calls to make it deterministic plus rounding into buckets:

TOTAL:
better: 81764
worse: 19594

Once the data is made deterministic, the third set looks slightly better than the first, rather than slightly worse. But almost 20% of the query types still look worse after applying the patch. I'm going to dig deeper into those to see if that conclusion survives bumping up the size of the dataset. It will take quite some time to run the tests with a huge dataset, but I don't see how else to investigate this.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#22Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Mark Dilger (#21)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/12/21 12:02 AM, Mark Dilger wrote:

...

Once the data is made deterministic, the third set looks slightly
better than the first, rather than slightly worse. But almost 20% of
the query types still look worse after applying the patch. I'm going to
dig deeper into those to see if that conclusion survives bumping up the
size of the dataset. It will take quite some time to run the tests with
a huge dataset, but I don't see how else to investigate this.

As I said in my last reply, I'm not sure it's particularly useful to
look at overall results from data sets with independent columns. That's
not what extended statistics are for, and people should not create them
in those cases ...

Maybe it'd be better to focus on cases with the largest difference in
estimates, and investigate those more closely.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#23Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tomas Vondra (#22)
Re: Use extended statistics to estimate (Var op Var) clauses

On Aug 11, 2021, at 3:45 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

As I said in my last reply, I'm not sure it's particularly useful to look at overall results from data sets with independent columns. That's not what extended statistics are for, and people should not create them in those cases ...

We sent our last emails more or less simultaneously. I'm not ignoring your email; I just hadn't seen it yet when I sent mine.

Maybe it'd be better to focus on cases with the largest difference in estimates, and investigate those more closely.

Yeah, I'm working on a correlated stats test as I write this. I'll get back to you when I have results.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#24Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Mark Dilger (#23)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/12/21 12:48 AM, Mark Dilger wrote:

On Aug 11, 2021, at 3:45 PM, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

As I said in my last reply, I'm not sure it's particularly useful
to look at overall results from data sets with independent columns.
That's not what extended statistics are for, and people should not
create them in those cases ...

We sent our last emails more or less simultaneously. I'm not
ignoring your email; I just hadn't seen it yet when I sent mine.

Apologies, I didn't mean to imply you're ignoring my messages.

Maybe it'd be better to focus on cases with the largest difference
in estimates, and investigate those more closely.

Yeah, I'm working on a correlated stats test as I write this. I'll
get back to you when I have results.

Cool, thanks!

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#25Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Mark Dilger (#23)
Re: Use extended statistics to estimate (Var op Var) clauses

On Aug 11, 2021, at 3:48 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

I'm working on a correlated stats test as I write this. I'll get back to you when I have results.

Ok, the tests showed no statistically significant regressions. All tests included the same sorts of whereclause expressions as used in the tests from yesterday's email.

The first test created loosely correlated data and found no significant row estimate improvements or regressions.

The second test of more tightly correlated data showed a row estimate improvement overall, with no class of whereclause showing an estimate regression. I think the apparent regressions from yesterday were just statistical noise.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#26Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Mark Dilger (#25)
2 attachment(s)
Re: Use extended statistics to estimate (Var op Var) clauses

Hi Mark,

This thread inspired me to do something fairly similar - a generator
that generates queries of varying complexity, executes them on table
with and without extended statistics. I've been thinking about that
before, but this finally pushed me to do that, and some of the results
are fairly interesting ...

I've pushed everything (generator and results) to this github repo:

https://github.com/tvondra/stats-test

with a summary of all results here:

https://github.com/tvondra/stats-test/blob/master/results.md

Some basic facts about the generator.py (see query_generator):

* It's using a fixed seed to make it deterministic.

* A small fraction of generated queries is sampled and executed (5%).

* Thanks to a fixed seed we generate/sample the same set of queries for
different runs, which allows us to compare runs easily.

* The queries use 2 - 5 clauses, either (Var op Const) or (Var op Var).

* The operators are the usual equality/inequality ones.

* The clauses are combined using AND/OR (also randomly picked).

* There's a random set of parens added, to vary the operator precedence
(otherwise it'd be driven entirely by AND/OR).

* There are two datasets - a random and correlated one, with different
number of distinct values in each column (10, 100, 1000, 10000).

* The statistics target is set to 10, 100, 1000, 10000.

It's a bit hacky, with various bits hard-coded at the moment. But it
could be extended to do other stuff fairly easily, I think.

Anyway, the repository contains results for three cases:

1) master
2) patched: master with the (Var op Var) patch
3) fixed: patched, with a fix for "simple" clauses (a crude patch)

And for each case we have three row counts:

* actual (from explain analyze)
* estimate without extended stats
* estimate with extended stats

And then we can calculate "estimation error" as

estimate / actual

both with and without statistics. Results for two cases can be plotted
as a scatter plot, with the two estimation errors as (x,y) values. The
idea is that this shows how a patch affects estimates - a point (100,10)
means that it was 100x over-estimated, and with the patch it's just 10x,
and similarly for other points.

This is what the charts at

https://github.com/tvondra/stats-test/blob/master/results.md

do, for each combination of parameters (dataset, statistics target and
number of distinct values). There's one chart without extended stats,
one with extended stats.

An "ideal" chart would look like like a single point (1,1) which means
"accurate estimates without/with patch", or (?,1) which means "poor
estimates before, accurate estimates now". Diagonal means "no change".

In principle, we expect the charts to look like this:

* random: diagonal charts, because there should be no extended stats
built, hence no impact on estimates is expected

* correlated: getting closer to 1.0, which looks like a horizontal line
in the chart

Consider for example this:

https://github.com/tvondra/stats-test/raw/master/correlated-1000-10.png

which clearly shows that the first patch is almost exactly the same as
master, while with the fix the estimates improve significantly (and are
almost perfect), at least with the statistics.

Without stats there's a bunch of queries that suddenly get from
"perfect" to much worse (looks like a vertical line on the left chart).

But there are other "strange" cases with "interesting patterns", like
for example

*
https://raw.githubusercontent.com/tvondra/stats-test/master/correlated-100-100.png

*
https://raw.githubusercontent.com/tvondra/stats-test/master/correlated-1000-100.png

*
https://raw.githubusercontent.com/tvondra/stats-test/master/random-10000-10.png

This likely shows the patches are a significant improvement for some
queries (either getting better than master, or even making the estimates
pretty accurate). But it's probably worth looking into the queries that
got worse, etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-patch-20210720.patchtext/x-patch; charset=UTF-8; name=0001-patch-20210720.patchDownload
From 4abb267dfdd46e5c2157ff6d201c11d068331519 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 15 Aug 2021 13:23:13 +0200
Subject: [PATCH 1/2] patch 20210720

---
 src/backend/optimizer/path/clausesel.c        |  37 +++-
 src/backend/statistics/extended_stats.c       |  83 ++++++---
 src/backend/statistics/mcv.c                  | 172 +++++++++++++-----
 .../statistics/extended_stats_internal.h      |   4 +-
 src/test/regress/expected/stats_ext.out       |  96 ++++++++++
 src/test/regress/sql/stats_ext.sql            |  26 +++
 6 files changed, 341 insertions(+), 77 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d263ecf082..6a7e9ceea5 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -714,6 +714,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
 	bool		cacheable = false;
+	Node	   *src = clause;
 
 	if (clause == NULL)			/* can this still happen? */
 		return s1;
@@ -871,11 +872,37 @@ clause_selectivity_ext(PlannerInfo *root,
 		}
 		else
 		{
-			/* Estimate selectivity for a restriction clause. */
-			s1 = restriction_selectivity(root, opno,
-										 opclause->args,
-										 opclause->inputcollid,
-										 varRelid);
+			/*
+			 * It might be a single (Expr op Expr) clause, which goes here due
+			 * to the optimization at the beginning of clauselist_selectivity.
+			 * So we try applying extended stats first, and then fall back to
+			 * restriction_selectivity.
+			 */
+			bool	estimated = false;
+
+			if (use_extended_stats)
+			{
+				List	   *clauses = list_make1(src);
+				RelOptInfo *rel = find_single_rel_for_clauses(root, clauses);
+
+				if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+				{
+					Bitmapset  *estimatedclauses = NULL;
+
+					s1 = statext_clauselist_selectivity(root, clauses, varRelid,
+														jointype, sjinfo, rel,
+														&estimatedclauses, false);
+
+					estimated = (bms_num_members(estimatedclauses) == 1);
+				}
+			}
+
+			/* Estimate selectivity for a restriction clause (fallback). */
+			if (!estimated)
+				s1 = restriction_selectivity(root, opno,
+											 opclause->args,
+											 opclause->inputcollid,
+											 varRelid);
 		}
 
 		/*
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 2e55913bc8..606cf8c588 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1347,19 +1347,27 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		return true;
 	}
 
-	/* (Var/Expr op Const) or (Const op Var/Expr) */
+	/*
+	 * Three opclause variants are supported: (Expr op Const), (Const op Expr),
+	 * (Expr op Expr). That means we may need to analyze one or two expressions
+	 * to make sure the opclause is compatible with extended stats.
+	 */
 	if (is_opclause(clause))
 	{
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		OpExpr	   *expr = (OpExpr *) clause;
-		Node	   *clause_expr;
+		ListCell   *lc;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
-		/* Check if the expression has the right shape */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		/*
+		 * Check if the expression has the right shape. This returns either one
+		 * or two expressions, depending on whether there is a Const.
+		 */
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
 		/*
@@ -1399,13 +1407,31 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			!get_func_leakproof(get_opcode(expr->opno)))
 			return false;
 
-		/* Check (Var op Const) or (Const op Var) clauses by recursing. */
-		if (IsA(clause_expr, Var))
-			return statext_is_compatible_clause_internal(root, clause_expr,
-														 relid, attnums, exprs);
+		/*
+		 * There's always at least one expression, otherwise the clause would
+		 * not be considered compatible.
+		 */
+		Assert(list_length(clause_exprs) >= 1);
+
+		/*
+		 * Check all expressions by recursing. Var expressions are handled as
+		 * a special case (to match it to attnums etc.)
+		 */
+		foreach (lc, clause_exprs)
+		{
+			Node *clause_expr = (Node *) lfirst(lc);
+
+			if (IsA(clause_expr, Var))
+			{
+				/* if the Var is incompatible, the whole clause is incompatible */
+				if (!statext_is_compatible_clause_internal(root, clause_expr,
+														   relid, attnums, exprs))
+					return false;
+			}
+			else	/* generic expression */
+				*exprs = lappend(*exprs, clause_expr);
+		}
 
-		/* Otherwise we have (Expr op Const) or (Const op Expr). */
-		*exprs = lappend(*exprs, clause_expr);
 		return true;
 	}
 
@@ -1415,15 +1441,21 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) clause;
 		Node	   *clause_expr;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
+		/* There has to be one expression exactly. */
+		Assert(list_length(clause_exprs) == 1);
+
+		clause_expr = (Node *) linitial(clause_exprs);
+
 		/*
 		 * If it's not one of the supported operators ("=", "<", ">", etc.),
 		 * just ignore the clause, as it's not compatible with MCV lists.
@@ -2009,20 +2041,19 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
  * examine_opclause_args
  *		Split an operator expression's arguments into Expr and Const parts.
  *
- * Attempts to match the arguments to either (Expr op Const) or (Const op
- * Expr), possibly with a RelabelType on top. When the expression matches this
- * form, returns true, otherwise returns false.
+ * Attempts to match the arguments to either (Expr op Const) or (Const op Expr)
+ * or (Expr op Expr), possibly with a RelabelType on top. When the expression
+ * matches this form, returns true, otherwise returns false.
  *
  * Optionally returns pointers to the extracted Expr/Const nodes, when passed
- * non-null pointers (exprp, cstp and expronleftp). The expronleftp flag
+ * non-null pointers (exprsp, cstp and expronleftp). The expronleftp flag
  * specifies on which side of the operator we found the expression node.
  */
 bool
-examine_opclause_args(List *args, Node **exprp, Const **cstp,
-					  bool *expronleftp)
+examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp)
 {
-	Node	   *expr;
-	Const	   *cst;
+	List	   *exprs = NIL;
+	Const	   *cst = NULL;
 	bool		expronleft;
 	Node	   *leftop,
 			   *rightop;
@@ -2042,22 +2073,26 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 
 	if (IsA(rightop, Const))
 	{
-		expr = (Node *) leftop;
+		exprs = lappend(exprs, leftop);
 		cst = (Const *) rightop;
 		expronleft = true;
 	}
 	else if (IsA(leftop, Const))
 	{
-		expr = (Node *) rightop;
+		exprs = lappend(exprs, rightop);
 		cst = (Const *) leftop;
 		expronleft = false;
 	}
 	else
-		return false;
+	{
+		exprs = lappend(exprs, leftop);
+		exprs = lappend(exprs, rightop);
+		expronleft = false;
+	}
 
 	/* return pointers to the extracted parts if requested */
-	if (exprp)
-		*exprp = expr;
+	if (exprsp)
+		*exprsp = exprs;
 
 	if (cstp)
 		*cstp = cst;
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index ef118952c7..85f650f572 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1645,78 +1645,154 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			Node	   *clause_expr2;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
-			int			idx;
 			Oid			collid;
 
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
-			/* match the attribute/expression to a dimension of the statistic */
-			idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+			if (cst)	/* Expr op Const */
+			{
+				int idx;
 
-			Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert(list_length(clause_exprs) == 1);
+				clause_expr = (Node *) linitial(clause_exprs);
 
-			/*
-			 * Walk through the MCV items and evaluate the current clause. We
-			 * can skip items that were already ruled out, and terminate if
-			 * there are no remaining MCV items that might possibly match.
-			 */
-			for (i = 0; i < mcvlist->nitems; i++)
-			{
-				bool		match = true;
-				MCVItem    *item = &mcvlist->items[i];
+				/* match the attribute/expression to a dimension of the statistic */
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
 
-				Assert(idx >= 0);
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * When the MCV item or the Const value is NULL we can treat
-				 * this as a mismatch. We must not call the operator because
-				 * of strictness.
+				 * Walk through the MCV items and evaluate the current clause. We
+				 * can skip items that were already ruled out, and terminate if
+				 * there are no remaining MCV items that might possibly match.
 				 */
-				if (item->isnull[idx] || cst->constisnull)
+				for (i = 0; i < mcvlist->nitems; i++)
 				{
-					matches[i] = RESULT_MERGE(matches[i], is_or, false);
-					continue;
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					Assert(idx >= 0);
+
+					/*
+					 * When the MCV item or the Const value is NULL we can treat
+					 * this as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || cst->constisnull)
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap. Once
+					 * the value gets false for AND-lists, or true for OR-lists,
+					 * we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics, but
+					 * we can use the collation for the attribute itself, as
+					 * stored in varcollid. We do reset the statistics after a
+					 * type change (including collation change), so this is OK.
+					 * For expressions, we use the collation extracted from the
+					 * expression itself.
+					 */
+					if (expronleft)
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   item->values[idx],
+															   cst->constvalue));
+					else
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   cst->constvalue,
+															   item->values[idx]));
+
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
 				}
+			}
+			else	/* Expr op Expr */
+			{
+				int			idx;
+				int			idx2;
+
+				Assert(list_length(clause_exprs) == 2);
+
+				clause_expr = (Node *) linitial(clause_exprs);
+				clause_expr2 = (Node *) lsecond(clause_exprs);
+
+				Assert(clause_expr2);
+				Assert(!expronleft);
 
 				/*
-				 * Skip MCV items that can't change result in the bitmap. Once
-				 * the value gets false for AND-lists, or true for OR-lists,
-				 * we don't need to look at more clauses.
+				 * Match the expressions to a dimension of the statistic.
+				 *
+				 * XXX Can the collations differ?
 				 */
-				if (RESULT_IS_FINAL(matches[i], is_or))
-					continue;
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+				idx2 = mcv_match_expression(clause_expr2, keys, exprs, &collid);
+
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert((idx2 >= 0) && (idx2 < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * First check whether the constant is below the lower
-				 * boundary (in that case we can skip the bucket, because
-				 * there's no overlap).
-				 *
-				 * We don't store collations used to build the statistics, but
-				 * we can use the collation for the attribute itself, as
-				 * stored in varcollid. We do reset the statistics after a
-				 * type change (including collation change), so this is OK.
-				 * For expressions, we use the collation extracted from the
-				 * expression itself.
+				 * Walk through the MCV items and evaluate the current clause.
+				 * We can skip items that were already ruled out, and
+				 * terminate if there are no remaining MCV items that might
+				 * possibly match.
 				 */
-				if (expronleft)
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					/*
+					 * When either of the MCV items is NULL we can treat this
+					 * as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || item->isnull[idx2])
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap.
+					 * Once the value gets false for AND-lists, or true for
+					 * OR-lists, we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics,
+					 * but we can use the collation for the attribute itself,
+					 * as stored in varcollid. We do reset the statistics after
+					 * a type change (including collation change), so this is
+					 * OK. We may need to relax this after allowing extended
+					 * statistics on expressions.
+					 */
 					match = DatumGetBool(FunctionCall2Coll(&opproc,
 														   collid,
 														   item->values[idx],
-														   cst->constvalue));
-				else
-					match = DatumGetBool(FunctionCall2Coll(&opproc,
-														   collid,
-														   cst->constvalue,
-														   item->values[idx]));
+														   item->values[idx2]));
 
-				/* update the match bitmap with the result */
-				matches[i] = RESULT_MERGE(matches[i], is_or, match);
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
+				}
 			}
 		}
 		else if (IsA(clause, ScalarArrayOpExpr))
@@ -1726,6 +1802,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
 			Oid			collid;
@@ -1743,11 +1820,14 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
 			/* ScalarArrayOpExpr has the Var always on the left */
 			Assert(expronleft);
+			Assert(list_length(clause_exprs) == 1);
+
+			clause_expr = (Node *) linitial(clause_exprs);
 
 			/* XXX what if (cst->constisnull == NULL)? */
 			if (!cst->constisnull)
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 55cd9252a5..1f30fa9060 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -99,8 +99,8 @@ extern SortItem *build_sorted_items(StatsBuildData *data, int *nitems,
 									MultiSortSupport mss,
 									int numattrs, AttrNumber *attnums);
 
-extern bool examine_opclause_args(List *args, Node **exprp,
-								  Const **cstp, bool *expronleftp);
+extern bool examine_opclause_args(List *args, List **exprs, Const **cstp,
+								  bool *expronleftp);
 
 extern Selectivity mcv_combine_selectivities(Selectivity simple_sel,
 											 Selectivity mcv_sel,
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 7fb54de53d..301227079c 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1904,6 +1904,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        343 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      1667 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2051,6 +2063,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        200 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      3750 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2460,6 +2484,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       3750 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+        25 |   2500
+(1 row)
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 ANALYZE mcv_lists;
@@ -2493,6 +2523,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       2500 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -2587,6 +2623,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
       1094 |      0
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      9950 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 ANALYZE mcv_lists_bool;
@@ -2742,6 +2790,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 (1 row)
 
 DROP TABLE mcv_lists_partial;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -2801,6 +2861,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       2649 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+         1 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
 CREATE STATISTICS mcv_lists_multi_2 (mcv) ON c, d FROM mcv_lists_multi;
@@ -2847,6 +2925,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       1571 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
 DROP TABLE mcv_lists_multi;
 -- statistics on integer expressions
 CREATE TABLE expr_stats (a int, b int, c int);
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index d563c4654c..b0cb0f1924 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -946,6 +946,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -999,6 +1003,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -1198,6 +1206,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 
@@ -1216,6 +1226,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -1291,6 +1303,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 
@@ -1376,6 +1392,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 
 DROP TABLE mcv_lists_partial;
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -1403,6 +1423,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
@@ -1417,6 +1440,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 DROP TABLE mcv_lists_multi;
 
-- 
2.31.1

0002-fix-don-t-treat-Var-op-Var-as-simple-clauses.patchtext/x-patch; charset=UTF-8; name=0002-fix-don-t-treat-Var-op-Var-as-simple-clauses.patchDownload
From e3fa97bfc187fa86f0fe2a7fe816dc79339a2a57 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Wed, 18 Aug 2021 11:53:45 +0200
Subject: [PATCH 2/2] fix: don't treat Var op Var as simple clauses

---
 src/backend/statistics/extended_stats.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 606cf8c588..c66845887d 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1677,6 +1677,22 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 	return true;
 }
 
+static bool
+stat_clause_is_simple(Node *clause)
+{
+	List *vars;
+
+	if (IsA(clause, RestrictInfo))
+		clause = (Node *) ((RestrictInfo *) clause)->clause;
+
+	if (!is_opclause(clause) && !IsA(clause, Var))
+		return false;
+
+	vars = pull_var_clause(clause, 0);
+
+	return (list_length(vars) == 1);
+}
+
 /*
  * statext_mcv_clauselist_selectivity
  *		Estimate clauses using the best multi-column statistics.
@@ -1797,6 +1813,8 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 		listidx = -1;
 		foreach(l, clauses)
 		{
+			Node *clause = (Node *) lfirst(l);
+
 			/* Increment the index before we decide if to skip the clause. */
 			listidx++;
 
@@ -1836,12 +1854,13 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 			if ((list_attnums[listidx] == NULL &&
 				 list_length(list_exprs[listidx]) == 1) ||
 				(list_exprs[listidx] == NIL &&
-				 bms_membership(list_attnums[listidx]) == BMS_SINGLETON))
+				 bms_membership(list_attnums[listidx]) == BMS_SINGLETON &&
+				 stat_clause_is_simple(clause)))
 				simple_clauses = bms_add_member(simple_clauses,
 												list_length(stat_clauses));
 
 			/* add clause to list and mark it as estimated */
-			stat_clauses = lappend(stat_clauses, (Node *) lfirst(l));
+			stat_clauses = lappend(stat_clauses, clause);
 			*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
 
 			/*
-- 
2.31.1

#27Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tomas Vondra (#26)
Re: Use extended statistics to estimate (Var op Var) clauses

On Aug 18, 2021, at 3:43 AM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

I've pushed everything (generator and results) to this github repo

Thanks for the link. I took a very brief look. Perhaps we can combine efforts. I need to make progress on several other patches first, but hope to get back to this.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#28Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Mark Dilger (#27)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/18/21 3:16 PM, Mark Dilger wrote:

On Aug 18, 2021, at 3:43 AM, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I've pushed everything (generator and results) to this github repo

Thanks for the link. I took a very brief look. Perhaps we can
combine efforts. I need to make progress on several other patches
first, but hope to get back to this.

Sure - it'd be great to combine efforts. That's why I posted my scripts
& results. I understand there's plenty other work for both of us, so
take your time - no rush.

After looking at this for a while, it's clear the main issue is handling
of clauses referencing the same Var twice, like for example (a = a) or
(a < a). But it's not clear to me if this is something worth fixing, or
if extended statistics is the right place to do it.

If those clauses are worth the effort, why not to handle them better
even without extended statistics? We can easily evaluate these clauses
on per-column MCV, because they only reference a single Var.

It'd be rather strange if for example

select * from t where (a < a)

is mis-estimated simply because it can't use extended statistics
(there's just a single Var, so we won't consider extended stats), while

select * from t where (a < a) and b = 1

suddenly gets much better thanks to extended stats on (a,b), even when
(a,b) are perfectly independent.

So I think we better make eqsel/ineqsel smarter about estimating those
clauses, assuming we consider them important enough.

I think we can either reject the patch, which would mean we don't
consider (Var op Var) clauses to be common/important enough. Or we need
to improve the existing selectivity functions (even those without
extended statistics) to handle those clauses in a smarter way. Otherwise
there'd be strange/surprising inconsistencies.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#29Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tomas Vondra (#28)
Re: Use extended statistics to estimate (Var op Var) clauses

On Aug 20, 2021, at 11:20 AM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

I think we can either reject the patch, which would mean we don't consider (Var op Var) clauses to be common/important enough. Or we need to improve the existing selectivity functions (even those without extended statistics) to handle those clauses in a smarter way. Otherwise there'd be strange/surprising inconsistencies.

For datatypes with very few distinct values (bool, some enums, etc.) keeping an mcv list of (a,b) pairs seems helpful. The patch may be worth keeping for such cases. In other cases, I don't much see the point.

It seems that sampling the fraction of rows where (A op B) is true for any given op would be more helpful.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#30Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#28)
Re: Use extended statistics to estimate (Var op Var) clauses

On Fri, Aug 20, 2021 at 2:21 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

After looking at this for a while, it's clear the main issue is handling
of clauses referencing the same Var twice, like for example (a = a) or
(a < a). But it's not clear to me if this is something worth fixing, or
if extended statistics is the right place to do it.

If those clauses are worth the effort, why not to handle them better
even without extended statistics? We can easily evaluate these clauses
on per-column MCV, because they only reference a single Var.

+1.

It seems to me that what we ought to do is make "a < a", "a > a", and
"a != 0" all have an estimate of zero, and make "a <= a", "a >= a",
and "a = a" estimate 1-nullfrac. The extended statistics mechanism can
just ignore the first three types of clauses; the zero estimate has to
be 100% correct. It can't necessarily ignore the second three cases,
though. If the query says "WHERE a = a AND b = 1", "b = 1" may be more
or less likely given that a is known to be not null, and extended
statistics can tell us that.

--
Robert Haas
EDB: http://www.enterprisedb.com

#31Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#26)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/18/21 12:43 PM, Tomas Vondra wrote:

Hi Mark,

This thread inspired me to do something fairly similar - a generator
that generates queries of varying complexity, executes them on table
with and without extended statistics. I've been thinking about that
before, but this finally pushed me to do that, and some of the results
are fairly interesting ...

I've pushed everything (generator and results) to this github repo:

  https://github.com/tvondra/stats-test

with a summary of all results here:

  https://github.com/tvondra/stats-test/blob/master/results.md

FWIW I've pushed slightly reworked scripts and results - there are
results from two machines - xeon and i5. Xeon is mostly the same as
before, with some minor fixes, while i5 is does not allow clauses
referencing the same column twice (per discussion in this thread).

I think there was a bug in the original plot script, combining incorrect
data series in some cases, causing (at least) some of the strange
patterns mentioned.

I've also made the charts easier to read by splitting the cases into
separate plots and using transparency. I've also added png version back,
because plotting the .svg is quite slow.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#32Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Robert Haas (#30)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/20/21 8:56 PM, Robert Haas wrote:

On Fri, Aug 20, 2021 at 2:21 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

After looking at this for a while, it's clear the main issue is handling
of clauses referencing the same Var twice, like for example (a = a) or
(a < a). But it's not clear to me if this is something worth fixing, or
if extended statistics is the right place to do it.

If those clauses are worth the effort, why not to handle them better
even without extended statistics? We can easily evaluate these clauses
on per-column MCV, because they only reference a single Var.

+1.

It seems to me that what we ought to do is make "a < a", "a > a", and
"a != 0" all have an estimate of zero, and make "a <= a", "a >= a",
and "a = a" estimate 1-nullfrac. The extended statistics mechanism can
just ignore the first three types of clauses; the zero estimate has to
be 100% correct. It can't necessarily ignore the second three cases,
though. If the query says "WHERE a = a AND b = 1", "b = 1" may be more
or less likely given that a is known to be not null, and extended
statistics can tell us that.

Yeah, I agree this seems like the right approach (except I guess you
meant "a != a" and not "a != 0"). Assuming we want to do something about
these clauses at all - I'm still wondering if those clauses are common
in practice or just synthetic.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#33Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#32)
Re: Use extended statistics to estimate (Var op Var) clauses

On Fri, Aug 20, 2021 at 3:32 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Yeah, I agree this seems like the right approach (except I guess you
meant "a != a" and not "a != 0").

Err, yes.

Assuming we want to do something about
these clauses at all - I'm still wondering if those clauses are common
in practice or just synthetic.

Well, they are certainly less common than some things, but query
generators do a lot of wonky things.

Also, as a practical matter, it might be cheaper to do something about
them than to not do something about them. I don't really understand
the mechanism of operation of the patch, but I guess if somebody
writes "WHERE a = b", one thing you could do would be check whether
any of the MCVs for a are also MCVs for b, and if so you could
estimate something on that basis. If you happened to have extended
statistics for (a, b) then I guess you could do even better using, uh,
math, or something. But all of that sounds like hard work, and
checking whether "a" happens to be the same as "b" sounds super-cheap
by comparison.

If, as normally will be the case, the two sides are not the same, you
haven't really lost anything, because the expenditure of cycles to
test varno and varattno for equality must be utterly trivial in
comparison with fetching stats data and looping over MCV lists and
things. But if on occasion you find out that they are the same, then
you win! You can give a more accurate estimate with less computational
work.

--
Robert Haas
EDB: http://www.enterprisedb.com

#34Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Robert Haas (#33)
3 attachment(s)
Re: Use extended statistics to estimate (Var op Var) clauses

Hi,

The attached patch series is modified to improve estimates for these
special clauses (Var op Var with the same var on both sides) without
extended statistics. This is done in 0001, and it seems fairly simple
and cheap.

The 0002 part is still the same patch as on 2021/07/20. Part 0003 fixes
handling of those clauses so that we don't treat them as simple, but it
does that by tweaking statext_is_compatible_clause(), as suggested by
Dean. It does work, although it's a bit more invasive than simply
checking the shape of clause in statext_mcv_clauselist_selectivity.

I do have results for the randomly generated queries, and this does
improve the situation a lot - pretty much all the queries with (a=a) or
(a<a) clauses had terrible estimates, and this fixes that.

That being said, I'm still not sure if this is an issue in real-world
applications, or whether we're solving something because of synthetic
queries generated by the randomized generator. But the checks seem
fairly cheap, so maybe it doesn't matter too much.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Improve-estimates-for-Var-op-Var-with-the-s-20210828.patchtext/x-patch; charset=UTF-8; name=0001-Improve-estimates-for-Var-op-Var-with-the-s-20210828.patchDownload
From 708587e495483aff4b84487103a545b6e860d0c0 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 26 Aug 2021 23:01:04 +0200
Subject: [PATCH 1/3] Improve estimates for Var op Var with the same Var

When estimating (Var op Var) conditions, we can treat the case with the
same Var on both sides as a special case, and we can provide better
selectivity estimate than for the generic case.

For example for (a = a) we know it's 1.0, because all rows are expected
to match. Similarly for (a != a) , wich has selectivity 0.0. And the
same logic can be applied to inequality comparisons, like (a < a) etc.

In principle, those clauses are a bit strange and queries are unlikely
to use them. But query generators sometimes do silly things, and these
checks are quite cheap so it's likely a win.
---
 src/backend/utils/adt/selfuncs.c | 75 +++++++++++++++++++++++++++++++-
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 0c8c05f6c2..0baeaa040d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -210,7 +210,8 @@ static bool get_actual_variable_endpoint(Relation heapRel,
 										 MemoryContext outercontext,
 										 Datum *endpointDatum);
 static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
-
+static bool matching_restriction_variables(PlannerInfo *root, List *args,
+										   int varRelid);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -256,6 +257,14 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
 		}
 	}
 
+	/*
+	 * It it's (variable = variable) with the same variable on both sides, it's
+	 * a special case and we know it does not filter anything, which means the
+	 * selectivity is 1.0.
+	 */
+	if (matching_restriction_variables(root, args, varRelid))
+		return (negate) ? 0.0 : 1.0;
+
 	/*
 	 * If expression is not variable = something or something = variable, then
 	 * punt and return a default estimate.
@@ -1408,6 +1417,20 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
 	Oid			consttype;
 	double		selec;
 
+	/*
+	 * Handle (variable < variable) and (variable <= variable) with the same
+	 * variable on both sides as a special case. The strict inequality should
+	 * not match anything, hence selectivity is 0.0. The other case is about
+	 * the same as equality, so selectivity is 1.0.
+	 */
+	if (matching_restriction_variables(root, args, varRelid))
+	{
+		if (iseq)
+			PG_RETURN_FLOAT8(1.0);
+		else
+			PG_RETURN_FLOAT8(0.0);
+	}
+
 	/*
 	 * If expression is not variable op something or something op variable,
 	 * then punt and return a default estimate.
@@ -4871,6 +4894,56 @@ get_restriction_variable(PlannerInfo *root, List *args, int varRelid,
 	return false;
 }
 
+
+/*
+ * matching_restriction_variable
+ *		Examine the args of a restriction clause to see if it's of the
+ *		form (variable op variable) with the save variable on both sides.
+ *
+ * Inputs:
+ *	root: the planner info
+ *	args: clause argument list
+ *	varRelid: see specs for restriction selectivity functions
+ *
+ * Returns true if the same variable is on both sides, otherwise false.
+ */
+static bool
+matching_restriction_variables(PlannerInfo *root, List *args, int varRelid)
+{
+	Node	   *left,
+			   *right;
+	VariableStatData ldata;
+	VariableStatData rdata;
+	bool		res = false;
+
+	/* Fail if not a binary opclause (probably shouldn't happen) */
+	if (list_length(args) != 2)
+		return false;
+
+	left = (Node *) linitial(args);
+	right = (Node *) lsecond(args);
+
+	/*
+	 * Examine both sides.  Note that when varRelid is nonzero, Vars of other
+	 * relations will be treated as pseudoconstants.
+	 */
+	examine_variable(root, left, varRelid, &ldata);
+	examine_variable(root, right, varRelid, &rdata);
+
+	/*
+	 * If both sides are variable, and are equal, we win.
+	 */
+	if ((ldata.rel != NULL && rdata.rel != NULL) &&
+		equal(ldata.var, rdata.var))
+		res = true;
+
+	/* We don't need the stats. */
+	ReleaseVariableStats(ldata);
+	ReleaseVariableStats(rdata);
+
+	return res;
+}
+
 /*
  * get_join_variables
  *		Apply examine_variable() to each side of a join clause.
-- 
2.31.1

0002-patch-20210720-20210828.patchtext/x-patch; charset=UTF-8; name=0002-patch-20210720-20210828.patchDownload
From caa5c2aa0e640097a43e87f7d9557a640e6e8948 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 15 Aug 2021 13:23:13 +0200
Subject: [PATCH 2/3] patch 20210720

---
 src/backend/optimizer/path/clausesel.c        |  37 +++-
 src/backend/statistics/extended_stats.c       |  83 ++++++---
 src/backend/statistics/mcv.c                  | 172 +++++++++++++-----
 .../statistics/extended_stats_internal.h      |   4 +-
 src/test/regress/expected/stats_ext.out       |  96 ++++++++++
 src/test/regress/sql/stats_ext.sql            |  26 +++
 6 files changed, 341 insertions(+), 77 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d263ecf082..6a7e9ceea5 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -714,6 +714,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
 	bool		cacheable = false;
+	Node	   *src = clause;
 
 	if (clause == NULL)			/* can this still happen? */
 		return s1;
@@ -871,11 +872,37 @@ clause_selectivity_ext(PlannerInfo *root,
 		}
 		else
 		{
-			/* Estimate selectivity for a restriction clause. */
-			s1 = restriction_selectivity(root, opno,
-										 opclause->args,
-										 opclause->inputcollid,
-										 varRelid);
+			/*
+			 * It might be a single (Expr op Expr) clause, which goes here due
+			 * to the optimization at the beginning of clauselist_selectivity.
+			 * So we try applying extended stats first, and then fall back to
+			 * restriction_selectivity.
+			 */
+			bool	estimated = false;
+
+			if (use_extended_stats)
+			{
+				List	   *clauses = list_make1(src);
+				RelOptInfo *rel = find_single_rel_for_clauses(root, clauses);
+
+				if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+				{
+					Bitmapset  *estimatedclauses = NULL;
+
+					s1 = statext_clauselist_selectivity(root, clauses, varRelid,
+														jointype, sjinfo, rel,
+														&estimatedclauses, false);
+
+					estimated = (bms_num_members(estimatedclauses) == 1);
+				}
+			}
+
+			/* Estimate selectivity for a restriction clause (fallback). */
+			if (!estimated)
+				s1 = restriction_selectivity(root, opno,
+											 opclause->args,
+											 opclause->inputcollid,
+											 varRelid);
 		}
 
 		/*
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 2e55913bc8..606cf8c588 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1347,19 +1347,27 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		return true;
 	}
 
-	/* (Var/Expr op Const) or (Const op Var/Expr) */
+	/*
+	 * Three opclause variants are supported: (Expr op Const), (Const op Expr),
+	 * (Expr op Expr). That means we may need to analyze one or two expressions
+	 * to make sure the opclause is compatible with extended stats.
+	 */
 	if (is_opclause(clause))
 	{
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		OpExpr	   *expr = (OpExpr *) clause;
-		Node	   *clause_expr;
+		ListCell   *lc;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
-		/* Check if the expression has the right shape */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		/*
+		 * Check if the expression has the right shape. This returns either one
+		 * or two expressions, depending on whether there is a Const.
+		 */
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
 		/*
@@ -1399,13 +1407,31 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			!get_func_leakproof(get_opcode(expr->opno)))
 			return false;
 
-		/* Check (Var op Const) or (Const op Var) clauses by recursing. */
-		if (IsA(clause_expr, Var))
-			return statext_is_compatible_clause_internal(root, clause_expr,
-														 relid, attnums, exprs);
+		/*
+		 * There's always at least one expression, otherwise the clause would
+		 * not be considered compatible.
+		 */
+		Assert(list_length(clause_exprs) >= 1);
+
+		/*
+		 * Check all expressions by recursing. Var expressions are handled as
+		 * a special case (to match it to attnums etc.)
+		 */
+		foreach (lc, clause_exprs)
+		{
+			Node *clause_expr = (Node *) lfirst(lc);
+
+			if (IsA(clause_expr, Var))
+			{
+				/* if the Var is incompatible, the whole clause is incompatible */
+				if (!statext_is_compatible_clause_internal(root, clause_expr,
+														   relid, attnums, exprs))
+					return false;
+			}
+			else	/* generic expression */
+				*exprs = lappend(*exprs, clause_expr);
+		}
 
-		/* Otherwise we have (Expr op Const) or (Const op Expr). */
-		*exprs = lappend(*exprs, clause_expr);
 		return true;
 	}
 
@@ -1415,15 +1441,21 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) clause;
 		Node	   *clause_expr;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
+		/* There has to be one expression exactly. */
+		Assert(list_length(clause_exprs) == 1);
+
+		clause_expr = (Node *) linitial(clause_exprs);
+
 		/*
 		 * If it's not one of the supported operators ("=", "<", ">", etc.),
 		 * just ignore the clause, as it's not compatible with MCV lists.
@@ -2009,20 +2041,19 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
  * examine_opclause_args
  *		Split an operator expression's arguments into Expr and Const parts.
  *
- * Attempts to match the arguments to either (Expr op Const) or (Const op
- * Expr), possibly with a RelabelType on top. When the expression matches this
- * form, returns true, otherwise returns false.
+ * Attempts to match the arguments to either (Expr op Const) or (Const op Expr)
+ * or (Expr op Expr), possibly with a RelabelType on top. When the expression
+ * matches this form, returns true, otherwise returns false.
  *
  * Optionally returns pointers to the extracted Expr/Const nodes, when passed
- * non-null pointers (exprp, cstp and expronleftp). The expronleftp flag
+ * non-null pointers (exprsp, cstp and expronleftp). The expronleftp flag
  * specifies on which side of the operator we found the expression node.
  */
 bool
-examine_opclause_args(List *args, Node **exprp, Const **cstp,
-					  bool *expronleftp)
+examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp)
 {
-	Node	   *expr;
-	Const	   *cst;
+	List	   *exprs = NIL;
+	Const	   *cst = NULL;
 	bool		expronleft;
 	Node	   *leftop,
 			   *rightop;
@@ -2042,22 +2073,26 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 
 	if (IsA(rightop, Const))
 	{
-		expr = (Node *) leftop;
+		exprs = lappend(exprs, leftop);
 		cst = (Const *) rightop;
 		expronleft = true;
 	}
 	else if (IsA(leftop, Const))
 	{
-		expr = (Node *) rightop;
+		exprs = lappend(exprs, rightop);
 		cst = (Const *) leftop;
 		expronleft = false;
 	}
 	else
-		return false;
+	{
+		exprs = lappend(exprs, leftop);
+		exprs = lappend(exprs, rightop);
+		expronleft = false;
+	}
 
 	/* return pointers to the extracted parts if requested */
-	if (exprp)
-		*exprp = expr;
+	if (exprsp)
+		*exprsp = exprs;
 
 	if (cstp)
 		*cstp = cst;
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index ef118952c7..85f650f572 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1645,78 +1645,154 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			Node	   *clause_expr2;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
-			int			idx;
 			Oid			collid;
 
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
-			/* match the attribute/expression to a dimension of the statistic */
-			idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+			if (cst)	/* Expr op Const */
+			{
+				int idx;
 
-			Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert(list_length(clause_exprs) == 1);
+				clause_expr = (Node *) linitial(clause_exprs);
 
-			/*
-			 * Walk through the MCV items and evaluate the current clause. We
-			 * can skip items that were already ruled out, and terminate if
-			 * there are no remaining MCV items that might possibly match.
-			 */
-			for (i = 0; i < mcvlist->nitems; i++)
-			{
-				bool		match = true;
-				MCVItem    *item = &mcvlist->items[i];
+				/* match the attribute/expression to a dimension of the statistic */
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
 
-				Assert(idx >= 0);
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * When the MCV item or the Const value is NULL we can treat
-				 * this as a mismatch. We must not call the operator because
-				 * of strictness.
+				 * Walk through the MCV items and evaluate the current clause. We
+				 * can skip items that were already ruled out, and terminate if
+				 * there are no remaining MCV items that might possibly match.
 				 */
-				if (item->isnull[idx] || cst->constisnull)
+				for (i = 0; i < mcvlist->nitems; i++)
 				{
-					matches[i] = RESULT_MERGE(matches[i], is_or, false);
-					continue;
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					Assert(idx >= 0);
+
+					/*
+					 * When the MCV item or the Const value is NULL we can treat
+					 * this as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || cst->constisnull)
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap. Once
+					 * the value gets false for AND-lists, or true for OR-lists,
+					 * we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics, but
+					 * we can use the collation for the attribute itself, as
+					 * stored in varcollid. We do reset the statistics after a
+					 * type change (including collation change), so this is OK.
+					 * For expressions, we use the collation extracted from the
+					 * expression itself.
+					 */
+					if (expronleft)
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   item->values[idx],
+															   cst->constvalue));
+					else
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   cst->constvalue,
+															   item->values[idx]));
+
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
 				}
+			}
+			else	/* Expr op Expr */
+			{
+				int			idx;
+				int			idx2;
+
+				Assert(list_length(clause_exprs) == 2);
+
+				clause_expr = (Node *) linitial(clause_exprs);
+				clause_expr2 = (Node *) lsecond(clause_exprs);
+
+				Assert(clause_expr2);
+				Assert(!expronleft);
 
 				/*
-				 * Skip MCV items that can't change result in the bitmap. Once
-				 * the value gets false for AND-lists, or true for OR-lists,
-				 * we don't need to look at more clauses.
+				 * Match the expressions to a dimension of the statistic.
+				 *
+				 * XXX Can the collations differ?
 				 */
-				if (RESULT_IS_FINAL(matches[i], is_or))
-					continue;
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+				idx2 = mcv_match_expression(clause_expr2, keys, exprs, &collid);
+
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert((idx2 >= 0) && (idx2 < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * First check whether the constant is below the lower
-				 * boundary (in that case we can skip the bucket, because
-				 * there's no overlap).
-				 *
-				 * We don't store collations used to build the statistics, but
-				 * we can use the collation for the attribute itself, as
-				 * stored in varcollid. We do reset the statistics after a
-				 * type change (including collation change), so this is OK.
-				 * For expressions, we use the collation extracted from the
-				 * expression itself.
+				 * Walk through the MCV items and evaluate the current clause.
+				 * We can skip items that were already ruled out, and
+				 * terminate if there are no remaining MCV items that might
+				 * possibly match.
 				 */
-				if (expronleft)
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					/*
+					 * When either of the MCV items is NULL we can treat this
+					 * as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || item->isnull[idx2])
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap.
+					 * Once the value gets false for AND-lists, or true for
+					 * OR-lists, we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics,
+					 * but we can use the collation for the attribute itself,
+					 * as stored in varcollid. We do reset the statistics after
+					 * a type change (including collation change), so this is
+					 * OK. We may need to relax this after allowing extended
+					 * statistics on expressions.
+					 */
 					match = DatumGetBool(FunctionCall2Coll(&opproc,
 														   collid,
 														   item->values[idx],
-														   cst->constvalue));
-				else
-					match = DatumGetBool(FunctionCall2Coll(&opproc,
-														   collid,
-														   cst->constvalue,
-														   item->values[idx]));
+														   item->values[idx2]));
 
-				/* update the match bitmap with the result */
-				matches[i] = RESULT_MERGE(matches[i], is_or, match);
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
+				}
 			}
 		}
 		else if (IsA(clause, ScalarArrayOpExpr))
@@ -1726,6 +1802,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
 			Oid			collid;
@@ -1743,11 +1820,14 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
 			/* ScalarArrayOpExpr has the Var always on the left */
 			Assert(expronleft);
+			Assert(list_length(clause_exprs) == 1);
+
+			clause_expr = (Node *) linitial(clause_exprs);
 
 			/* XXX what if (cst->constisnull == NULL)? */
 			if (!cst->constisnull)
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 55cd9252a5..1f30fa9060 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -99,8 +99,8 @@ extern SortItem *build_sorted_items(StatsBuildData *data, int *nitems,
 									MultiSortSupport mss,
 									int numattrs, AttrNumber *attnums);
 
-extern bool examine_opclause_args(List *args, Node **exprp,
-								  Const **cstp, bool *expronleftp);
+extern bool examine_opclause_args(List *args, List **exprs, Const **cstp,
+								  bool *expronleftp);
 
 extern Selectivity mcv_combine_selectivities(Selectivity simple_sel,
 											 Selectivity mcv_sel,
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 7fb54de53d..301227079c 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1904,6 +1904,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        343 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      1667 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2051,6 +2063,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        200 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      3750 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2460,6 +2484,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       3750 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+        25 |   2500
+(1 row)
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 ANALYZE mcv_lists;
@@ -2493,6 +2523,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       2500 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -2587,6 +2623,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
       1094 |      0
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      9950 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 ANALYZE mcv_lists_bool;
@@ -2742,6 +2790,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 (1 row)
 
 DROP TABLE mcv_lists_partial;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -2801,6 +2861,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       2649 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+         1 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
 CREATE STATISTICS mcv_lists_multi_2 (mcv) ON c, d FROM mcv_lists_multi;
@@ -2847,6 +2925,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       1571 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
 DROP TABLE mcv_lists_multi;
 -- statistics on integer expressions
 CREATE TABLE expr_stats (a int, b int, c int);
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index d563c4654c..b0cb0f1924 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -946,6 +946,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -999,6 +1003,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -1198,6 +1206,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 
@@ -1216,6 +1226,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -1291,6 +1303,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 
@@ -1376,6 +1392,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 
 DROP TABLE mcv_lists_partial;
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -1403,6 +1423,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
@@ -1417,6 +1440,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 DROP TABLE mcv_lists_multi;
 
-- 
2.31.1

0003-Don-t-treat-Var-op-Var-as-simple-clauses-20210828.patchtext/x-patch; charset=UTF-8; name=0003-Don-t-treat-Var-op-Var-as-simple-clauses-20210828.patchDownload
From 743015add200d8041737f364281139ae0b5a5766 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Wed, 18 Aug 2021 11:53:45 +0200
Subject: [PATCH 3/3] Don't treat Var op Var as simple clauses

---
 src/backend/statistics/extended_stats.c       | 60 ++++++++++++++-----
 src/backend/statistics/mcv.c                  |  4 +-
 .../statistics/extended_stats_internal.h      |  2 +-
 3 files changed, 49 insertions(+), 17 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 606cf8c588..f555ace022 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1319,7 +1319,7 @@ choose_best_statistics(List *stats, char requiredkind,
 static bool
 statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 									  Index relid, Bitmapset **attnums,
-									  List **exprs)
+									  List **exprs, bool *issimple)
 {
 	/* Look inside any binary-compatible relabeling (as in examine_variable) */
 	if (IsA(clause, RelabelType))
@@ -1343,6 +1343,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			return false;
 
 		*attnums = bms_add_member(*attnums, var->varattno);
+		*issimple = true;
 
 		return true;
 	}
@@ -1367,7 +1368,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		 * Check if the expression has the right shape. This returns either one
 		 * or two expressions, depending on whether there is a Const.
 		 */
-		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL, issimple))
 			return false;
 
 		/*
@@ -1420,16 +1421,20 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		foreach (lc, clause_exprs)
 		{
 			Node *clause_expr = (Node *) lfirst(lc);
+			bool	tmp = false;
 
 			if (IsA(clause_expr, Var))
 			{
 				/* if the Var is incompatible, the whole clause is incompatible */
 				if (!statext_is_compatible_clause_internal(root, clause_expr,
-														   relid, attnums, exprs))
+														   relid, attnums, exprs,
+														   &tmp))
 					return false;
 			}
 			else	/* generic expression */
 				*exprs = lappend(*exprs, clause_expr);
+
+			*issimple = (*issimple && tmp);
 		}
 
 		return true;
@@ -1448,7 +1453,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL, issimple))
 			return false;
 
 		/* There has to be one expression exactly. */
@@ -1496,7 +1501,8 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		/* Check Var IN Array clauses by recursing. */
 		if (IsA(clause_expr, Var))
 			return statext_is_compatible_clause_internal(root, clause_expr,
-														 relid, attnums, exprs);
+														 relid, attnums, exprs,
+														 issimple);
 
 		/* Otherwise we have Expr IN Array. */
 		*exprs = lappend(*exprs, clause_expr);
@@ -1533,8 +1539,11 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			 */
 			if (!statext_is_compatible_clause_internal(root,
 													   (Node *) lfirst(lc),
-													   relid, attnums, exprs))
+													   relid, attnums, exprs,
+													   issimple))
 				return false;
+
+			*issimple = false;
 		}
 
 		return true;
@@ -1548,7 +1557,8 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		/* Check Var IS NULL clauses by recursing. */
 		if (IsA(nt->arg, Var))
 			return statext_is_compatible_clause_internal(root, (Node *) (nt->arg),
-														 relid, attnums, exprs);
+														 relid, attnums, exprs,
+														 issimple);
 
 		/* Otherwise we have Expr IS NULL. */
 		*exprs = lappend(*exprs, nt->arg);
@@ -1584,13 +1594,15 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
  */
 static bool
 statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
-							 Bitmapset **attnums, List **exprs)
+							 Bitmapset **attnums, List **exprs, bool *issimple)
 {
 	RangeTblEntry *rte = root->simple_rte_array[relid];
 	RestrictInfo *rinfo = (RestrictInfo *) clause;
 	int			clause_relid;
 	Oid			userid;
 
+	*issimple = true;
+
 	/*
 	 * Special-case handling for bare BoolExpr AND clauses, because the
 	 * restrictinfo machinery doesn't build RestrictInfos on top of AND
@@ -1608,8 +1620,11 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 		foreach(lc, expr->args)
 		{
 			if (!statext_is_compatible_clause(root, (Node *) lfirst(lc),
-											  relid, attnums, exprs))
+											  relid, attnums, exprs,
+											  issimple))
 				return false;
+
+			*issimple = false;
 		}
 
 		return true;
@@ -1630,7 +1645,7 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 
 	/* Check the clause and determine what attributes it references. */
 	if (!statext_is_compatible_clause_internal(root, (Node *) rinfo->clause,
-											   relid, attnums, exprs))
+											   relid, attnums, exprs, issimple))
 		return false;
 
 	/*
@@ -1720,6 +1735,7 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 	ListCell   *l;
 	Bitmapset **list_attnums;	/* attnums extracted from the clause */
 	List	  **list_exprs;		/* expressions matched to any statistic */
+	bool	  *list_simple;		/* marks simple expressions */
 	int			listidx;
 	Selectivity sel = (is_or) ? 0.0 : 1.0;
 
@@ -1733,6 +1749,9 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 	/* expressions extracted from complex expressions */
 	list_exprs = (List **) palloc(sizeof(Node *) * list_length(clauses));
 
+	/* expressions determined to be simple (single expression) */
+	list_simple = (bool *) palloc(sizeof(bool) * list_length(clauses));
+
 	/*
 	 * Pre-process the clauses list to extract the attnums and expressions
 	 * seen in each item.  We need to determine if there are any clauses which
@@ -1750,17 +1769,21 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 		Node	   *clause = (Node *) lfirst(l);
 		Bitmapset  *attnums = NULL;
 		List	   *exprs = NIL;
+		bool		issimple = false;
 
 		if (!bms_is_member(listidx, *estimatedclauses) &&
-			statext_is_compatible_clause(root, clause, rel->relid, &attnums, &exprs))
+			statext_is_compatible_clause(root, clause, rel->relid,
+										 &attnums, &exprs, &issimple))
 		{
 			list_attnums[listidx] = attnums;
 			list_exprs[listidx] = exprs;
+			list_simple[listidx] = issimple;
 		}
 		else
 		{
 			list_attnums[listidx] = NULL;
 			list_exprs[listidx] = NIL;
+			list_simple[listidx] = false;
 		}
 
 		listidx++;
@@ -1797,6 +1820,8 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 		listidx = -1;
 		foreach(l, clauses)
 		{
+			Node *clause = (Node *) lfirst(l);
+
 			/* Increment the index before we decide if to skip the clause. */
 			listidx++;
 
@@ -1836,12 +1861,12 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 			if ((list_attnums[listidx] == NULL &&
 				 list_length(list_exprs[listidx]) == 1) ||
 				(list_exprs[listidx] == NIL &&
-				 bms_membership(list_attnums[listidx]) == BMS_SINGLETON))
+				 list_simple[listidx]))
 				simple_clauses = bms_add_member(simple_clauses,
 												list_length(stat_clauses));
 
 			/* add clause to list and mark it as estimated */
-			stat_clauses = lappend(stat_clauses, (Node *) lfirst(l));
+			stat_clauses = lappend(stat_clauses, clause);
 			*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
 
 			/*
@@ -2050,13 +2075,15 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
  * specifies on which side of the operator we found the expression node.
  */
 bool
-examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp)
+examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp,
+					  bool *issimplep)
 {
 	List	   *exprs = NIL;
 	Const	   *cst = NULL;
 	bool		expronleft;
 	Node	   *leftop,
 			   *rightop;
+	bool		issimple;
 
 	/* enforced by statext_is_compatible_clause_internal */
 	Assert(list_length(args) == 2);
@@ -2082,12 +2109,14 @@ examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp
 		exprs = lappend(exprs, rightop);
 		cst = (Const *) leftop;
 		expronleft = false;
+		issimple = true;
 	}
 	else
 	{
 		exprs = lappend(exprs, leftop);
 		exprs = lappend(exprs, rightop);
 		expronleft = false;
+		issimple = true;
 	}
 
 	/* return pointers to the extracted parts if requested */
@@ -2100,6 +2129,9 @@ examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp
 	if (expronleftp)
 		*expronleftp = expronleft;
 
+	if (issimplep)
+		*issimplep = issimple;
+
 	return true;
 }
 
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 85f650f572..ad8172db90 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1654,7 +1654,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft, NULL))
 				elog(ERROR, "incompatible clause");
 
 			if (cst)	/* Expr op Const */
@@ -1820,7 +1820,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft, NULL))
 				elog(ERROR, "incompatible clause");
 
 			/* ScalarArrayOpExpr has the Var always on the left */
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 1f30fa9060..d86cc4184b 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -100,7 +100,7 @@ extern SortItem *build_sorted_items(StatsBuildData *data, int *nitems,
 									int numattrs, AttrNumber *attnums);
 
 extern bool examine_opclause_args(List *args, List **exprs, Const **cstp,
-								  bool *expronleftp);
+								  bool *expronleftp, bool *issimplep);
 
 extern Selectivity mcv_combine_selectivities(Selectivity simple_sel,
 											 Selectivity mcv_sel,
-- 
2.31.1

#35Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#34)
Re: Use extended statistics to estimate (Var op Var) clauses

On Sat, Aug 28, 2021 at 6:53 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Hi,

The attached patch series is modified to improve estimates for these
special clauses (Var op Var with the same var on both sides) without
extended statistics. This is done in 0001, and it seems fairly simple
and cheap.

The 0002 part is still the same patch as on 2021/07/20. Part 0003 fixes
handling of those clauses so that we don't treat them as simple, but it
does that by tweaking statext_is_compatible_clause(), as suggested by
Dean. It does work, although it's a bit more invasive than simply
checking the shape of clause in statext_mcv_clauselist_selectivity.

I do have results for the randomly generated queries, and this does
improve the situation a lot - pretty much all the queries with (a=a) or
(a<a) clauses had terrible estimates, and this fixes that.

That being said, I'm still not sure if this is an issue in real-world
applications, or whether we're solving something because of synthetic
queries generated by the randomized generator. But the checks seem
fairly cheap, so maybe it doesn't matter too much.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,
For 0001-Improve-estimates-for-Var-op-Var-with-the-s-20210828.patch :

+ * form (variable op variable) with the save variable on both sides.

typo: save -> same

Cheers

#36Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tomas Vondra (#34)
Re: Use extended statistics to estimate (Var op Var) clauses

On Aug 28, 2021, at 6:52 AM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Part 0003 fixes handling of those clauses so that we don't treat them as simple, but it does that by tweaking statext_is_compatible_clause(), as suggested by Dean.

Function examine_opclause_args() doesn't set issimple to anything in the IsA(rightop, Const) case, but assigns *issimplep = issimple at the bottom. The compiler is not complaining about using a possibly uninitialized variable, but if I change the "return true" on the very next line to "return issimple", the compiler complains quite loudly.

Some functions define bool *issimple, others bool *issimplep and bool issimple. You might want to standardize the naming.

It's difficult to know what "simple" means in extended_stats.c. There is no file-global comment explaining the concept, and functions like compare_scalars_simple don't have correlates named compare_scalars_complex or such, so the reader cannot infer by comparison what the difference might be between a "simple" case and some non-"simple" case. The functions' issimple (or issimplep) argument are undocumented.

There is a comment:

/*
* statext_mcv_clauselist_selectivity
* Estimate clauses using the best multi-column statistics.
....
*
* - simple selectivity: Computed without extended statistics, i.e. as if the
* columns/clauses were independent.
*
....
*/

but it takes a while to find if you search for "issimple".

In both scalarineqsel_wrapper() and eqsel_internal(), the call to matching_restriction_variables() should usually return false, since comparing a variable to itself is an unusual case. The next call is to get_restriction_variable(), which repeats the work of examining the left and right variables. So in almost all cases, after throwing away the results of:

examine_variable(root, left, varRelid, &ldata);
examine_variable(root, right, varRelid, &rdata);

performed in matching_restriction_variables(), we'll do exactly the same work again (with one variable named differently) in get_restriction_variable():

examine_variable(root, left, varRelid, vardata);
examine_variable(root, right, varRelid, &rdata);

That'd be fine if example_variable() were a cheap function, but it appears not to be. Do you think you could save the results rather than recomputing them? It's a little messy, since these are the only two functions out of about ten which follow this pattern, so you'd have to pass NULLs into get_restriction_variable() from the other eight callers, but it still looks like that would be a win.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#37Zhihong Yu
zyu@yugabyte.com
In reply to: Mark Dilger (#36)
Re: Use extended statistics to estimate (Var op Var) clauses

On Sat, Aug 28, 2021 at 9:30 AM Mark Dilger <mark.dilger@enterprisedb.com>
wrote:

On Aug 28, 2021, at 6:52 AM, Tomas Vondra <tomas.vondra@enterprisedb.com>

wrote:

Part 0003 fixes handling of those clauses so that we don't treat them as

simple, but it does that by tweaking statext_is_compatible_clause(), as
suggested by Dean.

Function examine_opclause_args() doesn't set issimple to anything in the
IsA(rightop, Const) case, but assigns *issimplep = issimple at the bottom.
The compiler is not complaining about using a possibly uninitialized
variable, but if I change the "return true" on the very next line to
"return issimple", the compiler complains quite loudly.

Some functions define bool *issimple, others bool *issimplep and bool
issimple. You might want to standardize the naming.

It's difficult to know what "simple" means in extended_stats.c. There is
no file-global comment explaining the concept, and functions like
compare_scalars_simple don't have correlates named compare_scalars_complex
or such, so the reader cannot infer by comparison what the difference might
be between a "simple" case and some non-"simple" case. The functions'
issimple (or issimplep) argument are undocumented.

There is a comment:

/*
* statext_mcv_clauselist_selectivity
* Estimate clauses using the best multi-column statistics.
....
*
* - simple selectivity: Computed without extended statistics, i.e. as if
the
* columns/clauses were independent.
*
....
*/

but it takes a while to find if you search for "issimple".

In both scalarineqsel_wrapper() and eqsel_internal(), the call to
matching_restriction_variables() should usually return false, since
comparing a variable to itself is an unusual case. The next call is to
get_restriction_variable(), which repeats the work of examining the left
and right variables. So in almost all cases, after throwing away the
results of:

examine_variable(root, left, varRelid, &ldata);
examine_variable(root, right, varRelid, &rdata);

performed in matching_restriction_variables(), we'll do exactly the same
work again (with one variable named differently) in
get_restriction_variable():

examine_variable(root, left, varRelid, vardata);
examine_variable(root, right, varRelid, &rdata);

That'd be fine if example_variable() were a cheap function, but it appears
not to be. Do you think you could save the results rather than recomputing
them? It's a little messy, since these are the only two functions out of
about ten which follow this pattern, so you'd have to pass NULLs into
get_restriction_variable() from the other eight callers, but it still looks
like that would be a win.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,

I wonder if the queries with (a=a) or (a<a) clauses are worth this
additional complexity to address.
Has anyone seen such clause in production queries ?

I would think the randomly generated queries should be narrowed a bit to
exclude such cases where the result of the clause is known regardless of
the underlying data.

Cheers

#38Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Zhihong Yu (#37)
Re: Use extended statistics to estimate (Var op Var) clauses

On Aug 28, 2021, at 10:18 AM, Zhihong Yu <zyu@yugabyte.com> wrote:

I wonder if the queries with (a=a) or (a<a) clauses are worth this additional complexity to address.
Has anyone seen such clause in production queries ?

You might expect clauses like WHERE FALSE to be unusual, but that phrase gets added quite a lot by query generators. Somebody could add "WHERE a != a" in a misguided attempt to achieve the same thing.

I wouldn't be terribly surprised if query generators might generate queries with a variable number of tables joined together with comparisons between the joined tables, and in the degenerate case of only one table end up with a table column compared against itself.

You could argue that those people need to fix their queries/generators to not do this sort of thing, but the end user affected by such queries may have little ability to fix it.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#39Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Mark Dilger (#36)
3 attachment(s)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/28/21 6:30 PM, Mark Dilger wrote:

On Aug 28, 2021, at 6:52 AM, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Part 0003 fixes handling of those clauses so that we don't treat
them as simple, but it does that by tweaking
statext_is_compatible_clause(), as suggested by Dean.

Function examine_opclause_args() doesn't set issimple to anything in
the IsA(rightop, Const) case, but assigns *issimplep = issimple at
the bottom. The compiler is not complaining about using a possibly
uninitialized variable, but if I change the "return true" on the very
next line to "return issimple", the compiler complains quite loudly.

Yeah, true. Thanks for noticing this was a bug - I forgot to set the
issimple variable in the first branch.

Some functions define bool *issimple, others bool *issimplep and bool
issimple. You might want to standardize the naming.

I think the naming is standard with respect to the surrounding code. If
the other parameters use "p" to mark "pointer" then issimplep is used,
but in other places it's just "issimple". IMHO this is appropriate.

It's difficult to know what "simple" means in extended_stats.c.
There is no file-global comment explaining the concept, and functions
like compare_scalars_simple don't have correlates named
compare_scalars_complex or such, so the reader cannot infer by
comparison what the difference might be between a "simple" case and
some non-"simple" case. The functions' issimple (or issimplep)
argument are undocumented.

There is a comment:

/* * statext_mcv_clauselist_selectivity * Estimate clauses using
the best multi-column statistics. .... * * - simple selectivity:
Computed without extended statistics, i.e. as if the *
columns/clauses were independent. * .... */

but it takes a while to find if you search for "issimple".

Yeah, true. This was added a while ago when Dean reworked the estimation
(based on MCV), and it seemed clear back then. But now a comment
explaining this concept (and how it affects the estimation) would be
helpful. I'll try digging in the archives for the details.

In both scalarineqsel_wrapper() and eqsel_internal(), the call to
matching_restriction_variables() should usually return false, since
comparing a variable to itself is an unusual case. The next call is
to get_restriction_variable(), which repeats the work of examining
the left and right variables. So in almost all cases, after throwing
away the results of:

examine_variable(root, left, varRelid, &ldata);
examine_variable(root, right, varRelid, &rdata);

performed in matching_restriction_variables(), we'll do exactly the
same work again (with one variable named differently) in
get_restriction_variable():

examine_variable(root, left, varRelid, vardata);
examine_variable(root, right, varRelid, &rdata);

That'd be fine if example_variable() were a cheap function, but it
appears not to be. Do you think you could save the results rather
than recomputing them? It's a little messy, since these are the only
two functions out of about ten which follow this pattern, so you'd
have to pass NULLs into get_restriction_variable() from the other
eight callers, but it still looks like that would be a win.

I had similar concerns, although I don't think those functions are very
expensive compared to the rest of the estimation code. I haven't done
any measurements yet, though.

But I don't think saving the results is the way to go - in a way, we
already store the stats (which seems like the most expensive bit) in
syscache. It seems better to just simplify examine_variable() so that it
does not lookup the statistics, which we don't need here at all.

The attached version of the patches fixes the other bugs reported here
so far - most importantly it reworks how we set issimple while examining
the clauses, so that it's never skips the initialization. Hopefully the
added comments also explain it a bit more clearly.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Improve-estimates-for-Var-op-Var-with-the-s-20210830.patchtext/x-patch; charset=UTF-8; name=0001-Improve-estimates-for-Var-op-Var-with-the-s-20210830.patchDownload
From 71837f82db88baa9e0250d3868078140cf11f5a7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 26 Aug 2021 23:01:04 +0200
Subject: [PATCH 1/3] Improve estimates for Var op Var with the same Var

When estimating (Var op Var) conditions, we can treat the case with the
same Var on both sides as a special case, and we can provide better
selectivity estimate than for the generic case.

For example for (a = a) we know it's 1.0, because all rows are expected
to match. Similarly for (a != a) , wich has selectivity 0.0. And the
same logic can be applied to inequality comparisons, like (a < a) etc.

In principle, those clauses are a bit strange and queries are unlikely
to use them. But query generators sometimes do silly things, and these
checks are quite cheap so it's likely a win.
---
 src/backend/utils/adt/selfuncs.c | 77 +++++++++++++++++++++++++++++++-
 1 file changed, 76 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 0c8c05f6c2..22608aadc1 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -210,7 +210,8 @@ static bool get_actual_variable_endpoint(Relation heapRel,
 										 MemoryContext outercontext,
 										 Datum *endpointDatum);
 static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
-
+static bool matching_restriction_variables(PlannerInfo *root, List *args,
+										   int varRelid);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -256,6 +257,14 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
 		}
 	}
 
+	/*
+	 * It it's (variable = variable) with the same variable on both sides, it's
+	 * a special case and we know it's not expected to filter anything, so we
+	 * estimate the selectivity as 1.0 (or 0.0 if it's negated).
+	 */
+	if (matching_restriction_variables(root, args, varRelid))
+		return (negate) ? 0.0 : 1.0;
+
 	/*
 	 * If expression is not variable = something or something = variable, then
 	 * punt and return a default estimate.
@@ -1408,6 +1417,22 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
 	Oid			consttype;
 	double		selec;
 
+	/*
+	 * Handle (variable < variable) and (variable <= variable) with the same
+	 * variable on both sides as a special case. The strict inequality should
+	 * not match any rows, hence selectivity is 0.0. The other case is about
+	 * the same as equality, so selectivity is 1.0.
+	 */
+	if (matching_restriction_variables(root, args, varRelid))
+	{
+		/* The case with equality matches all rows, so estimate it as 1.0. */
+		if (iseq)
+			PG_RETURN_FLOAT8(1.0);
+
+		/* Strict inequality matches nothing, so selectivity is 0.0. */
+		PG_RETURN_FLOAT8(0.0);
+	}
+
 	/*
 	 * If expression is not variable op something or something op variable,
 	 * then punt and return a default estimate.
@@ -4871,6 +4896,56 @@ get_restriction_variable(PlannerInfo *root, List *args, int varRelid,
 	return false;
 }
 
+
+/*
+ * matching_restriction_variable
+ *		Examine the args of a restriction clause to see if it's of the
+ *		form (variable op variable) with the same variable on both sides.
+ *
+ * Inputs:
+ *	root: the planner info
+ *	args: clause argument list
+ *	varRelid: see specs for restriction selectivity functions
+ *
+ * Returns true if the same variable is on both sides, otherwise false.
+ */
+static bool
+matching_restriction_variables(PlannerInfo *root, List *args, int varRelid)
+{
+	Node	   *left,
+			   *right;
+	VariableStatData ldata;
+	VariableStatData rdata;
+	bool		res = false;
+
+	/* Fail if not a binary opclause (probably shouldn't happen) */
+	if (list_length(args) != 2)
+		return false;
+
+	left = (Node *) linitial(args);
+	right = (Node *) lsecond(args);
+
+	/*
+	 * Examine both sides.  Note that when varRelid is nonzero, Vars of other
+	 * relations will be treated as pseudoconstants.
+	 */
+	examine_variable(root, left, varRelid, &ldata);
+	examine_variable(root, right, varRelid, &rdata);
+
+	/*
+	 * If both sides are variable, and are equal, we win.
+	 */
+	if ((ldata.rel != NULL && rdata.rel != NULL) &&
+		equal(ldata.var, rdata.var))
+		res = true;
+
+	/* We don't need the stats. */
+	ReleaseVariableStats(ldata);
+	ReleaseVariableStats(rdata);
+
+	return res;
+}
+
 /*
  * get_join_variables
  *		Apply examine_variable() to each side of a join clause.
-- 
2.31.1

0002-main-patch-20210830.patchtext/x-patch; charset=UTF-8; name=0002-main-patch-20210830.patchDownload
From ee19131d26afb9029798fa42d0fcd46ac72589de Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 15 Aug 2021 13:23:13 +0200
Subject: [PATCH 2/3] main patch

---
 src/backend/optimizer/path/clausesel.c        |  37 +++-
 src/backend/statistics/extended_stats.c       |  83 ++++++---
 src/backend/statistics/mcv.c                  | 172 +++++++++++++-----
 .../statistics/extended_stats_internal.h      |   4 +-
 src/test/regress/expected/stats_ext.out       |  96 ++++++++++
 src/test/regress/sql/stats_ext.sql            |  26 +++
 6 files changed, 341 insertions(+), 77 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d263ecf082..6a7e9ceea5 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -714,6 +714,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
 	bool		cacheable = false;
+	Node	   *src = clause;
 
 	if (clause == NULL)			/* can this still happen? */
 		return s1;
@@ -871,11 +872,37 @@ clause_selectivity_ext(PlannerInfo *root,
 		}
 		else
 		{
-			/* Estimate selectivity for a restriction clause. */
-			s1 = restriction_selectivity(root, opno,
-										 opclause->args,
-										 opclause->inputcollid,
-										 varRelid);
+			/*
+			 * It might be a single (Expr op Expr) clause, which goes here due
+			 * to the optimization at the beginning of clauselist_selectivity.
+			 * So we try applying extended stats first, and then fall back to
+			 * restriction_selectivity.
+			 */
+			bool	estimated = false;
+
+			if (use_extended_stats)
+			{
+				List	   *clauses = list_make1(src);
+				RelOptInfo *rel = find_single_rel_for_clauses(root, clauses);
+
+				if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+				{
+					Bitmapset  *estimatedclauses = NULL;
+
+					s1 = statext_clauselist_selectivity(root, clauses, varRelid,
+														jointype, sjinfo, rel,
+														&estimatedclauses, false);
+
+					estimated = (bms_num_members(estimatedclauses) == 1);
+				}
+			}
+
+			/* Estimate selectivity for a restriction clause (fallback). */
+			if (!estimated)
+				s1 = restriction_selectivity(root, opno,
+											 opclause->args,
+											 opclause->inputcollid,
+											 varRelid);
 		}
 
 		/*
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 2e55913bc8..606cf8c588 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1347,19 +1347,27 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		return true;
 	}
 
-	/* (Var/Expr op Const) or (Const op Var/Expr) */
+	/*
+	 * Three opclause variants are supported: (Expr op Const), (Const op Expr),
+	 * (Expr op Expr). That means we may need to analyze one or two expressions
+	 * to make sure the opclause is compatible with extended stats.
+	 */
 	if (is_opclause(clause))
 	{
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		OpExpr	   *expr = (OpExpr *) clause;
-		Node	   *clause_expr;
+		ListCell   *lc;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
-		/* Check if the expression has the right shape */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		/*
+		 * Check if the expression has the right shape. This returns either one
+		 * or two expressions, depending on whether there is a Const.
+		 */
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
 		/*
@@ -1399,13 +1407,31 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			!get_func_leakproof(get_opcode(expr->opno)))
 			return false;
 
-		/* Check (Var op Const) or (Const op Var) clauses by recursing. */
-		if (IsA(clause_expr, Var))
-			return statext_is_compatible_clause_internal(root, clause_expr,
-														 relid, attnums, exprs);
+		/*
+		 * There's always at least one expression, otherwise the clause would
+		 * not be considered compatible.
+		 */
+		Assert(list_length(clause_exprs) >= 1);
+
+		/*
+		 * Check all expressions by recursing. Var expressions are handled as
+		 * a special case (to match it to attnums etc.)
+		 */
+		foreach (lc, clause_exprs)
+		{
+			Node *clause_expr = (Node *) lfirst(lc);
+
+			if (IsA(clause_expr, Var))
+			{
+				/* if the Var is incompatible, the whole clause is incompatible */
+				if (!statext_is_compatible_clause_internal(root, clause_expr,
+														   relid, attnums, exprs))
+					return false;
+			}
+			else	/* generic expression */
+				*exprs = lappend(*exprs, clause_expr);
+		}
 
-		/* Otherwise we have (Expr op Const) or (Const op Expr). */
-		*exprs = lappend(*exprs, clause_expr);
 		return true;
 	}
 
@@ -1415,15 +1441,21 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) clause;
 		Node	   *clause_expr;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
+		/* There has to be one expression exactly. */
+		Assert(list_length(clause_exprs) == 1);
+
+		clause_expr = (Node *) linitial(clause_exprs);
+
 		/*
 		 * If it's not one of the supported operators ("=", "<", ">", etc.),
 		 * just ignore the clause, as it's not compatible with MCV lists.
@@ -2009,20 +2041,19 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
  * examine_opclause_args
  *		Split an operator expression's arguments into Expr and Const parts.
  *
- * Attempts to match the arguments to either (Expr op Const) or (Const op
- * Expr), possibly with a RelabelType on top. When the expression matches this
- * form, returns true, otherwise returns false.
+ * Attempts to match the arguments to either (Expr op Const) or (Const op Expr)
+ * or (Expr op Expr), possibly with a RelabelType on top. When the expression
+ * matches this form, returns true, otherwise returns false.
  *
  * Optionally returns pointers to the extracted Expr/Const nodes, when passed
- * non-null pointers (exprp, cstp and expronleftp). The expronleftp flag
+ * non-null pointers (exprsp, cstp and expronleftp). The expronleftp flag
  * specifies on which side of the operator we found the expression node.
  */
 bool
-examine_opclause_args(List *args, Node **exprp, Const **cstp,
-					  bool *expronleftp)
+examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp)
 {
-	Node	   *expr;
-	Const	   *cst;
+	List	   *exprs = NIL;
+	Const	   *cst = NULL;
 	bool		expronleft;
 	Node	   *leftop,
 			   *rightop;
@@ -2042,22 +2073,26 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 
 	if (IsA(rightop, Const))
 	{
-		expr = (Node *) leftop;
+		exprs = lappend(exprs, leftop);
 		cst = (Const *) rightop;
 		expronleft = true;
 	}
 	else if (IsA(leftop, Const))
 	{
-		expr = (Node *) rightop;
+		exprs = lappend(exprs, rightop);
 		cst = (Const *) leftop;
 		expronleft = false;
 	}
 	else
-		return false;
+	{
+		exprs = lappend(exprs, leftop);
+		exprs = lappend(exprs, rightop);
+		expronleft = false;
+	}
 
 	/* return pointers to the extracted parts if requested */
-	if (exprp)
-		*exprp = expr;
+	if (exprsp)
+		*exprsp = exprs;
 
 	if (cstp)
 		*cstp = cst;
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index ef118952c7..85f650f572 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1645,78 +1645,154 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			Node	   *clause_expr2;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
-			int			idx;
 			Oid			collid;
 
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
-			/* match the attribute/expression to a dimension of the statistic */
-			idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+			if (cst)	/* Expr op Const */
+			{
+				int idx;
 
-			Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert(list_length(clause_exprs) == 1);
+				clause_expr = (Node *) linitial(clause_exprs);
 
-			/*
-			 * Walk through the MCV items and evaluate the current clause. We
-			 * can skip items that were already ruled out, and terminate if
-			 * there are no remaining MCV items that might possibly match.
-			 */
-			for (i = 0; i < mcvlist->nitems; i++)
-			{
-				bool		match = true;
-				MCVItem    *item = &mcvlist->items[i];
+				/* match the attribute/expression to a dimension of the statistic */
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
 
-				Assert(idx >= 0);
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * When the MCV item or the Const value is NULL we can treat
-				 * this as a mismatch. We must not call the operator because
-				 * of strictness.
+				 * Walk through the MCV items and evaluate the current clause. We
+				 * can skip items that were already ruled out, and terminate if
+				 * there are no remaining MCV items that might possibly match.
 				 */
-				if (item->isnull[idx] || cst->constisnull)
+				for (i = 0; i < mcvlist->nitems; i++)
 				{
-					matches[i] = RESULT_MERGE(matches[i], is_or, false);
-					continue;
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					Assert(idx >= 0);
+
+					/*
+					 * When the MCV item or the Const value is NULL we can treat
+					 * this as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || cst->constisnull)
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap. Once
+					 * the value gets false for AND-lists, or true for OR-lists,
+					 * we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics, but
+					 * we can use the collation for the attribute itself, as
+					 * stored in varcollid. We do reset the statistics after a
+					 * type change (including collation change), so this is OK.
+					 * For expressions, we use the collation extracted from the
+					 * expression itself.
+					 */
+					if (expronleft)
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   item->values[idx],
+															   cst->constvalue));
+					else
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   cst->constvalue,
+															   item->values[idx]));
+
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
 				}
+			}
+			else	/* Expr op Expr */
+			{
+				int			idx;
+				int			idx2;
+
+				Assert(list_length(clause_exprs) == 2);
+
+				clause_expr = (Node *) linitial(clause_exprs);
+				clause_expr2 = (Node *) lsecond(clause_exprs);
+
+				Assert(clause_expr2);
+				Assert(!expronleft);
 
 				/*
-				 * Skip MCV items that can't change result in the bitmap. Once
-				 * the value gets false for AND-lists, or true for OR-lists,
-				 * we don't need to look at more clauses.
+				 * Match the expressions to a dimension of the statistic.
+				 *
+				 * XXX Can the collations differ?
 				 */
-				if (RESULT_IS_FINAL(matches[i], is_or))
-					continue;
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+				idx2 = mcv_match_expression(clause_expr2, keys, exprs, &collid);
+
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert((idx2 >= 0) && (idx2 < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * First check whether the constant is below the lower
-				 * boundary (in that case we can skip the bucket, because
-				 * there's no overlap).
-				 *
-				 * We don't store collations used to build the statistics, but
-				 * we can use the collation for the attribute itself, as
-				 * stored in varcollid. We do reset the statistics after a
-				 * type change (including collation change), so this is OK.
-				 * For expressions, we use the collation extracted from the
-				 * expression itself.
+				 * Walk through the MCV items and evaluate the current clause.
+				 * We can skip items that were already ruled out, and
+				 * terminate if there are no remaining MCV items that might
+				 * possibly match.
 				 */
-				if (expronleft)
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					/*
+					 * When either of the MCV items is NULL we can treat this
+					 * as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || item->isnull[idx2])
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap.
+					 * Once the value gets false for AND-lists, or true for
+					 * OR-lists, we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics,
+					 * but we can use the collation for the attribute itself,
+					 * as stored in varcollid. We do reset the statistics after
+					 * a type change (including collation change), so this is
+					 * OK. We may need to relax this after allowing extended
+					 * statistics on expressions.
+					 */
 					match = DatumGetBool(FunctionCall2Coll(&opproc,
 														   collid,
 														   item->values[idx],
-														   cst->constvalue));
-				else
-					match = DatumGetBool(FunctionCall2Coll(&opproc,
-														   collid,
-														   cst->constvalue,
-														   item->values[idx]));
+														   item->values[idx2]));
 
-				/* update the match bitmap with the result */
-				matches[i] = RESULT_MERGE(matches[i], is_or, match);
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
+				}
 			}
 		}
 		else if (IsA(clause, ScalarArrayOpExpr))
@@ -1726,6 +1802,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
 			Oid			collid;
@@ -1743,11 +1820,14 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
 			/* ScalarArrayOpExpr has the Var always on the left */
 			Assert(expronleft);
+			Assert(list_length(clause_exprs) == 1);
+
+			clause_expr = (Node *) linitial(clause_exprs);
 
 			/* XXX what if (cst->constisnull == NULL)? */
 			if (!cst->constisnull)
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 55cd9252a5..1f30fa9060 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -99,8 +99,8 @@ extern SortItem *build_sorted_items(StatsBuildData *data, int *nitems,
 									MultiSortSupport mss,
 									int numattrs, AttrNumber *attnums);
 
-extern bool examine_opclause_args(List *args, Node **exprp,
-								  Const **cstp, bool *expronleftp);
+extern bool examine_opclause_args(List *args, List **exprs, Const **cstp,
+								  bool *expronleftp);
 
 extern Selectivity mcv_combine_selectivities(Selectivity simple_sel,
 											 Selectivity mcv_sel,
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 7fb54de53d..301227079c 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1904,6 +1904,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        343 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      1667 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2051,6 +2063,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        200 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      3750 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2460,6 +2484,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       3750 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+        25 |   2500
+(1 row)
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 ANALYZE mcv_lists;
@@ -2493,6 +2523,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       2500 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -2587,6 +2623,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
       1094 |      0
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      9950 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 ANALYZE mcv_lists_bool;
@@ -2742,6 +2790,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 (1 row)
 
 DROP TABLE mcv_lists_partial;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -2801,6 +2861,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       2649 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+         1 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
 CREATE STATISTICS mcv_lists_multi_2 (mcv) ON c, d FROM mcv_lists_multi;
@@ -2847,6 +2925,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       1571 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
 DROP TABLE mcv_lists_multi;
 -- statistics on integer expressions
 CREATE TABLE expr_stats (a int, b int, c int);
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index d563c4654c..b0cb0f1924 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -946,6 +946,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -999,6 +1003,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -1198,6 +1206,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 
@@ -1216,6 +1226,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -1291,6 +1303,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 
@@ -1376,6 +1392,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 
 DROP TABLE mcv_lists_partial;
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -1403,6 +1423,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
@@ -1417,6 +1440,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 DROP TABLE mcv_lists_multi;
 
-- 
2.31.1

0003-Don-t-treat-Var-op-Var-as-simple-clauses-20210830.patchtext/x-patch; charset=UTF-8; name=0003-Don-t-treat-Var-op-Var-as-simple-clauses-20210830.patchDownload
From 7926ea227c961e5c5ba3f4cefd254d794cf5f62d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Wed, 18 Aug 2021 11:53:45 +0200
Subject: [PATCH 3/3] Don't treat Var op Var as simple clauses

---
 src/backend/statistics/extended_stats.c       | 86 +++++++++++++++----
 src/backend/statistics/mcv.c                  |  4 +-
 .../statistics/extended_stats_internal.h      |  2 +-
 3 files changed, 74 insertions(+), 18 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 606cf8c588..5f283a0afb 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1315,11 +1315,14 @@ choose_best_statistics(List *stats, char requiredkind,
  * statext_is_compatible_clause. It needs to be split like this because
  * of recursion.  The attnums bitmap is an input/output parameter collecting
  * attribute numbers from all compatible clauses (recursively).
+ *
+ * XXX The issimple variable is expected to be initialized by the caller, we
+ * just update it while recursively analyzing the current clause.
  */
 static bool
 statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 									  Index relid, Bitmapset **attnums,
-									  List **exprs)
+									  List **exprs, bool *issimple)
 {
 	/* Look inside any binary-compatible relabeling (as in examine_variable) */
 	if (IsA(clause, RelabelType))
@@ -1367,7 +1370,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		 * Check if the expression has the right shape. This returns either one
 		 * or two expressions, depending on whether there is a Const.
 		 */
-		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL, issimple))
 			return false;
 
 		/*
@@ -1416,6 +1419,12 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		/*
 		 * Check all expressions by recursing. Var expressions are handled as
 		 * a special case (to match it to attnums etc.)
+		 *
+		 * An opclause is simple if it's (Expr op Const) or (Const op Expr). We
+		 * have already checked the overall shape in examine_opclause_args, but
+		 * we haven't checked the expressions are simple (i.e. pretty much Var),
+		 * so we need to check that now. If we discover a complex expression, we
+		 * consider the whole clause complex.
 		 */
 		foreach (lc, clause_exprs)
 		{
@@ -1425,11 +1434,17 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			{
 				/* if the Var is incompatible, the whole clause is incompatible */
 				if (!statext_is_compatible_clause_internal(root, clause_expr,
-														   relid, attnums, exprs))
+														   relid, attnums, exprs,
+														   issimple))
 					return false;
 			}
 			else	/* generic expression */
+			{
 				*exprs = lappend(*exprs, clause_expr);
+
+				/* switch to false if there are any complex clauses */
+				*issimple = false;
+			}
 		}
 
 		return true;
@@ -1448,7 +1463,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL, issimple))
 			return false;
 
 		/* There has to be one expression exactly. */
@@ -1496,7 +1511,8 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		/* Check Var IN Array clauses by recursing. */
 		if (IsA(clause_expr, Var))
 			return statext_is_compatible_clause_internal(root, clause_expr,
-														 relid, attnums, exprs);
+														 relid, attnums, exprs,
+														 issimple);
 
 		/* Otherwise we have Expr IN Array. */
 		*exprs = lappend(*exprs, clause_expr);
@@ -1525,6 +1541,14 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		BoolExpr   *expr = (BoolExpr *) clause;
 		ListCell   *lc;
 
+		/*
+		 * All AND/OR clauses are considered complex, even if all arguments are
+		 * simple clauses. For NOT clauses we need to check the argument and then
+		 * we can update the flag.
+		 */
+		if (!is_notclause(clause))
+			*issimple = false;
+
 		foreach(lc, expr->args)
 		{
 			/*
@@ -1533,7 +1557,8 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			 */
 			if (!statext_is_compatible_clause_internal(root,
 													   (Node *) lfirst(lc),
-													   relid, attnums, exprs))
+													   relid, attnums, exprs,
+													   issimple))
 				return false;
 		}
 
@@ -1548,7 +1573,8 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		/* Check Var IS NULL clauses by recursing. */
 		if (IsA(nt->arg, Var))
 			return statext_is_compatible_clause_internal(root, (Node *) (nt->arg),
-														 relid, attnums, exprs);
+														 relid, attnums, exprs,
+														 issimple);
 
 		/* Otherwise we have Expr IS NULL. */
 		*exprs = lappend(*exprs, nt->arg);
@@ -1584,22 +1610,35 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
  */
 static bool
 statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
-							 Bitmapset **attnums, List **exprs)
+							 Bitmapset **attnums, List **exprs, bool *issimple)
 {
 	RangeTblEntry *rte = root->simple_rte_array[relid];
 	RestrictInfo *rinfo = (RestrictInfo *) clause;
 	int			clause_relid;
 	Oid			userid;
 
+	/*
+	 * Clauses are considered simple by default, and we mark them as complex
+	 * when we discover a complex part.
+	 */
+	*issimple = true;
+
 	/*
 	 * Special-case handling for bare BoolExpr AND clauses, because the
 	 * restrictinfo machinery doesn't build RestrictInfos on top of AND
 	 * clauses.
+	 *
+	 * AND clauses are considered complex, even if all arguments are
+	 * simple clauses.
 	 */
 	if (is_andclause(clause))
 	{
 		BoolExpr   *expr = (BoolExpr *) clause;
 		ListCell   *lc;
+		bool		tmp = false;	/* ignored result */
+
+		/* AND clauses are complex, even if the arguments are simple. */
+		*issimple = false;
 
 		/*
 		 * Check that each sub-clause is compatible.  We expect these to be
@@ -1608,7 +1647,7 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 		foreach(lc, expr->args)
 		{
 			if (!statext_is_compatible_clause(root, (Node *) lfirst(lc),
-											  relid, attnums, exprs))
+											  relid, attnums, exprs, &tmp))
 				return false;
 		}
 
@@ -1630,7 +1669,7 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 
 	/* Check the clause and determine what attributes it references. */
 	if (!statext_is_compatible_clause_internal(root, (Node *) rinfo->clause,
-											   relid, attnums, exprs))
+											   relid, attnums, exprs, issimple))
 		return false;
 
 	/*
@@ -1720,6 +1759,7 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 	ListCell   *l;
 	Bitmapset **list_attnums;	/* attnums extracted from the clause */
 	List	  **list_exprs;		/* expressions matched to any statistic */
+	bool	  *list_simple;		/* marks simple expressions */
 	int			listidx;
 	Selectivity sel = (is_or) ? 0.0 : 1.0;
 
@@ -1733,6 +1773,9 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 	/* expressions extracted from complex expressions */
 	list_exprs = (List **) palloc(sizeof(Node *) * list_length(clauses));
 
+	/* expressions determined to be simple (single expression) */
+	list_simple = (bool *) palloc(sizeof(bool) * list_length(clauses));
+
 	/*
 	 * Pre-process the clauses list to extract the attnums and expressions
 	 * seen in each item.  We need to determine if there are any clauses which
@@ -1750,17 +1793,21 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 		Node	   *clause = (Node *) lfirst(l);
 		Bitmapset  *attnums = NULL;
 		List	   *exprs = NIL;
+		bool		issimple = false;
 
 		if (!bms_is_member(listidx, *estimatedclauses) &&
-			statext_is_compatible_clause(root, clause, rel->relid, &attnums, &exprs))
+			statext_is_compatible_clause(root, clause, rel->relid,
+										 &attnums, &exprs, &issimple))
 		{
 			list_attnums[listidx] = attnums;
 			list_exprs[listidx] = exprs;
+			list_simple[listidx] = issimple;
 		}
 		else
 		{
 			list_attnums[listidx] = NULL;
 			list_exprs[listidx] = NIL;
+			list_simple[listidx] = false;
 		}
 
 		listidx++;
@@ -1797,6 +1844,8 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 		listidx = -1;
 		foreach(l, clauses)
 		{
+			Node *clause = (Node *) lfirst(l);
+
 			/* Increment the index before we decide if to skip the clause. */
 			listidx++;
 
@@ -1835,13 +1884,12 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 			/* record simple clauses (single column or expression) */
 			if ((list_attnums[listidx] == NULL &&
 				 list_length(list_exprs[listidx]) == 1) ||
-				(list_exprs[listidx] == NIL &&
-				 bms_membership(list_attnums[listidx]) == BMS_SINGLETON))
+				 list_simple[listidx])
 				simple_clauses = bms_add_member(simple_clauses,
 												list_length(stat_clauses));
 
 			/* add clause to list and mark it as estimated */
-			stat_clauses = lappend(stat_clauses, (Node *) lfirst(l));
+			stat_clauses = lappend(stat_clauses, clause);
 			*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
 
 			/*
@@ -2050,13 +2098,15 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
  * specifies on which side of the operator we found the expression node.
  */
 bool
-examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp)
+examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp,
+					  bool *issimplep)
 {
 	List	   *exprs = NIL;
 	Const	   *cst = NULL;
 	bool		expronleft;
 	Node	   *leftop,
 			   *rightop;
+	bool		issimple;
 
 	/* enforced by statext_is_compatible_clause_internal */
 	Assert(list_length(args) == 2);
@@ -2076,18 +2126,21 @@ examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp
 		exprs = lappend(exprs, leftop);
 		cst = (Const *) rightop;
 		expronleft = true;
+		issimple = true;
 	}
 	else if (IsA(leftop, Const))
 	{
 		exprs = lappend(exprs, rightop);
 		cst = (Const *) leftop;
 		expronleft = false;
+		issimple = true;
 	}
 	else
 	{
 		exprs = lappend(exprs, leftop);
 		exprs = lappend(exprs, rightop);
 		expronleft = false;
+		issimple = false;
 	}
 
 	/* return pointers to the extracted parts if requested */
@@ -2100,6 +2153,9 @@ examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp
 	if (expronleftp)
 		*expronleftp = expronleft;
 
+	if (issimplep)
+		*issimplep = (*issimplep && issimple);
+
 	return true;
 }
 
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 85f650f572..ad8172db90 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1654,7 +1654,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft, NULL))
 				elog(ERROR, "incompatible clause");
 
 			if (cst)	/* Expr op Const */
@@ -1820,7 +1820,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft, NULL))
 				elog(ERROR, "incompatible clause");
 
 			/* ScalarArrayOpExpr has the Var always on the left */
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 1f30fa9060..d86cc4184b 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -100,7 +100,7 @@ extern SortItem *build_sorted_items(StatsBuildData *data, int *nitems,
 									int numattrs, AttrNumber *attnums);
 
 extern bool examine_opclause_args(List *args, List **exprs, Const **cstp,
-								  bool *expronleftp);
+								  bool *expronleftp, bool *issimplep);
 
 extern Selectivity mcv_combine_selectivities(Selectivity simple_sel,
 											 Selectivity mcv_sel,
-- 
2.31.1

#40Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#39)
Re: Use extended statistics to estimate (Var op Var) clauses

On Mon, Aug 30, 2021 at 9:00 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

On 8/28/21 6:30 PM, Mark Dilger wrote:

On Aug 28, 2021, at 6:52 AM, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Part 0003 fixes handling of those clauses so that we don't treat
them as simple, but it does that by tweaking
statext_is_compatible_clause(), as suggested by Dean.

Function examine_opclause_args() doesn't set issimple to anything in
the IsA(rightop, Const) case, but assigns *issimplep = issimple at
the bottom. The compiler is not complaining about using a possibly
uninitialized variable, but if I change the "return true" on the very
next line to "return issimple", the compiler complains quite loudly.

Yeah, true. Thanks for noticing this was a bug - I forgot to set the
issimple variable in the first branch.

Some functions define bool *issimple, others bool *issimplep and bool
issimple. You might want to standardize the naming.

I think the naming is standard with respect to the surrounding code. If
the other parameters use "p" to mark "pointer" then issimplep is used,
but in other places it's just "issimple". IMHO this is appropriate.

It's difficult to know what "simple" means in extended_stats.c.
There is no file-global comment explaining the concept, and functions
like compare_scalars_simple don't have correlates named
compare_scalars_complex or such, so the reader cannot infer by
comparison what the difference might be between a "simple" case and
some non-"simple" case. The functions' issimple (or issimplep)
argument are undocumented.

There is a comment:

/* * statext_mcv_clauselist_selectivity * Estimate clauses using
the best multi-column statistics. .... * * - simple selectivity:
Computed without extended statistics, i.e. as if the *
columns/clauses were independent. * .... */

but it takes a while to find if you search for "issimple".

Yeah, true. This was added a while ago when Dean reworked the estimation
(based on MCV), and it seemed clear back then. But now a comment
explaining this concept (and how it affects the estimation) would be
helpful. I'll try digging in the archives for the details.

In both scalarineqsel_wrapper() and eqsel_internal(), the call to
matching_restriction_variables() should usually return false, since
comparing a variable to itself is an unusual case. The next call is
to get_restriction_variable(), which repeats the work of examining
the left and right variables. So in almost all cases, after throwing
away the results of:

examine_variable(root, left, varRelid, &ldata);
examine_variable(root, right, varRelid, &rdata);

performed in matching_restriction_variables(), we'll do exactly the
same work again (with one variable named differently) in
get_restriction_variable():

examine_variable(root, left, varRelid, vardata);
examine_variable(root, right, varRelid, &rdata);

That'd be fine if example_variable() were a cheap function, but it
appears not to be. Do you think you could save the results rather
than recomputing them? It's a little messy, since these are the only
two functions out of about ten which follow this pattern, so you'd
have to pass NULLs into get_restriction_variable() from the other
eight callers, but it still looks like that would be a win.

I had similar concerns, although I don't think those functions are very
expensive compared to the rest of the estimation code. I haven't done
any measurements yet, though.

But I don't think saving the results is the way to go - in a way, we
already store the stats (which seems like the most expensive bit) in
syscache. It seems better to just simplify examine_variable() so that it
does not lookup the statistics, which we don't need here at all.

The attached version of the patches fixes the other bugs reported here
so far - most importantly it reworks how we set issimple while examining
the clauses, so that it's never skips the initialization. Hopefully the
added comments also explain it a bit more clearly.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,
For patch 0002,

+                   s1 = statext_clauselist_selectivity(root, clauses,
varRelid,
+                                                       jointype, sjinfo,
rel,
+                                                       &estimatedclauses,
false);
+
+                   estimated = (bms_num_members(estimatedclauses) == 1);

I took a look at clauselist_apply_dependencies() (called by
statext_clauselist_selectivity) where estimatedclauses is modified.
Since the caller would not use the returned Selectivity if number of
elements in estimatedclauses is greater than 1, I wonder
if a parameter can be added to clauselist_apply_dependencies() which
indicates early return if the second element is added to estimatedclauses.

Cheers

#41Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Zhihong Yu (#40)
Re: Use extended statistics to estimate (Var op Var) clauses

On 8/31/21 00:14, Zhihong Yu wrote:

Hi,
For patch 0002,

+                   s1 = statext_clauselist_selectivity(root, clauses, 
varRelid,
+                                                       jointype, 
sjinfo, rel,
+                                                       
&estimatedclauses, false);
+
+                   estimated = (bms_num_members(estimatedclauses) == 1);

I took a look at clauselist_apply_dependencies() (called by
statext_clauselist_selectivity) where estimatedclauses is modified.
Since the caller would not use the returned Selectivity if number of
elements in estimatedclauses is greater than 1, I wonder
if a parameter can be added to clauselist_apply_dependencies() which
indicates early return if the second element is added to estimatedclauses.

Hmmm, I'm not sure I understand your point. Are you suggesting there's a
bug in not updating the bitmap, or would this be an optimization? Can
you give an example of a query affected by this?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#42Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#39)
5 attachment(s)
Re: Use extended statistics to estimate (Var op Var) clauses

Hi,

I finally got around to this patch again, focusing mostly on the first
part that simply returns either 1.0 or 0.0 for Var op Var conditions
(i.e. the part not really using extended statistics).

I have been unhappy about using examine_variable, which does various
expensive things like searching for statistics (which only got worse
because now we're also looking for expression stats). But we don't
really need the stats - we just need to check the Vars match (same
relation, same attribute). So 0002 fixes this.

Which got me thinking that maybe we don't need to restrict this to Var
nodes only. We can just as easily compare arbitrary expressions,
provided it's for the same relation and there are no volatile functions.
So 0003 does this. Conditions with the same complex expression on each
side of an operator are probably fairly rare, but it's cheap so why not.

0004 and 0005 parts are unchanged.

The next steps is adding some tests to the first parts, and extending
the tests in the main patch (to also use more complex expressions, if
0003 gets included).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Improve-estimates-for-Var-op-Var-with-the-same-Var.patchtext/x-patch; charset=UTF-8; name=0001-Improve-estimates-for-Var-op-Var-with-the-same-Var.patchDownload
From 91371c3e4252580a30530d5b625f72d5525c5c73 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 26 Aug 2021 23:01:04 +0200
Subject: [PATCH 1/5] Improve estimates for Var op Var with the same Var

When estimating (Var op Var) conditions, we can treat the case with the
same Var on both sides as a special case, and we can provide better
selectivity estimate than for the generic case.

For example for (a = a) we know it's 1.0, because all rows are expected
to match. Similarly for (a != a) , wich has selectivity 0.0. And the
same logic can be applied to inequality comparisons, like (a < a) etc.

In principle, those clauses are a bit strange and queries are unlikely
to use them. But query generators sometimes do silly things, and these
checks are quite cheap so it's likely a win.
---
 src/backend/utils/adt/selfuncs.c | 77 +++++++++++++++++++++++++++++++-
 1 file changed, 76 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 10895fb287..59038895ec 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -210,7 +210,8 @@ static bool get_actual_variable_endpoint(Relation heapRel,
 										 MemoryContext outercontext,
 										 Datum *endpointDatum);
 static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
-
+static bool matching_restriction_variables(PlannerInfo *root, List *args,
+										   int varRelid);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -256,6 +257,14 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
 		}
 	}
 
+	/*
+	 * It it's (variable = variable) with the same variable on both sides, it's
+	 * a special case and we know it's not expected to filter anything, so we
+	 * estimate the selectivity as 1.0 (or 0.0 if it's negated).
+	 */
+	if (matching_restriction_variables(root, args, varRelid))
+		return (negate) ? 0.0 : 1.0;
+
 	/*
 	 * If expression is not variable = something or something = variable, then
 	 * punt and return a default estimate.
@@ -1408,6 +1417,22 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
 	Oid			consttype;
 	double		selec;
 
+	/*
+	 * Handle (variable < variable) and (variable <= variable) with the same
+	 * variable on both sides as a special case. The strict inequality should
+	 * not match any rows, hence selectivity is 0.0. The other case is about
+	 * the same as equality, so selectivity is 1.0.
+	 */
+	if (matching_restriction_variables(root, args, varRelid))
+	{
+		/* The case with equality matches all rows, so estimate it as 1.0. */
+		if (iseq)
+			PG_RETURN_FLOAT8(1.0);
+
+		/* Strict inequality matches nothing, so selectivity is 0.0. */
+		PG_RETURN_FLOAT8(0.0);
+	}
+
 	/*
 	 * If expression is not variable op something or something op variable,
 	 * then punt and return a default estimate.
@@ -4871,6 +4896,56 @@ get_restriction_variable(PlannerInfo *root, List *args, int varRelid,
 	return false;
 }
 
+
+/*
+ * matching_restriction_variable
+ *		Examine the args of a restriction clause to see if it's of the
+ *		form (variable op variable) with the same variable on both sides.
+ *
+ * Inputs:
+ *	root: the planner info
+ *	args: clause argument list
+ *	varRelid: see specs for restriction selectivity functions
+ *
+ * Returns true if the same variable is on both sides, otherwise false.
+ */
+static bool
+matching_restriction_variables(PlannerInfo *root, List *args, int varRelid)
+{
+	Node	   *left,
+			   *right;
+	VariableStatData ldata;
+	VariableStatData rdata;
+	bool		res = false;
+
+	/* Fail if not a binary opclause (probably shouldn't happen) */
+	if (list_length(args) != 2)
+		return false;
+
+	left = (Node *) linitial(args);
+	right = (Node *) lsecond(args);
+
+	/*
+	 * Examine both sides.  Note that when varRelid is nonzero, Vars of other
+	 * relations will be treated as pseudoconstants.
+	 */
+	examine_variable(root, left, varRelid, &ldata);
+	examine_variable(root, right, varRelid, &rdata);
+
+	/*
+	 * If both sides are variable, and are equal, we win.
+	 */
+	if ((ldata.rel != NULL && rdata.rel != NULL) &&
+		equal(ldata.var, rdata.var))
+		res = true;
+
+	/* We don't need the stats. */
+	ReleaseVariableStats(ldata);
+	ReleaseVariableStats(rdata);
+
+	return res;
+}
+
 /*
  * get_join_variables
  *		Apply examine_variable() to each side of a join clause.
-- 
2.31.1

0002-simplification.patchtext/x-patch; charset=UTF-8; name=0002-simplification.patchDownload
From 8e454f1aae54317d1d346012c06179ba62fbef47 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 13 Dec 2021 02:28:27 +0100
Subject: [PATCH 2/5] simplification

---
 src/backend/utils/adt/selfuncs.c | 46 +++++++++++++++++---------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 59038895ec..1031a0fd9e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -260,7 +260,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
 	/*
 	 * It it's (variable = variable) with the same variable on both sides, it's
 	 * a special case and we know it's not expected to filter anything, so we
-	 * estimate the selectivity as 1.0 (or 0.0 if it's negated).
+	 * estimate the selectivity as 1.0 (or 0.0 when it's negated).
 	 */
 	if (matching_restriction_variables(root, args, varRelid))
 		return (negate) ? 0.0 : 1.0;
@@ -4896,11 +4896,11 @@ get_restriction_variable(PlannerInfo *root, List *args, int varRelid,
 	return false;
 }
 
-
 /*
  * matching_restriction_variable
- *		Examine the args of a restriction clause to see if it's of the
- *		form (variable op variable) with the same variable on both sides.
+ *		Check if the two arguments of a restriction clause refer to the same
+ *		variable, i.e. if the condition is of the form (variable op variable).
+ *		We can deduce selectivity for such (in)equality clauses.
  *
  * Inputs:
  *	root: the planner info
@@ -4914,9 +4914,7 @@ matching_restriction_variables(PlannerInfo *root, List *args, int varRelid)
 {
 	Node	   *left,
 			   *right;
-	VariableStatData ldata;
-	VariableStatData rdata;
-	bool		res = false;
+	Var		   *lvar;
 
 	/* Fail if not a binary opclause (probably shouldn't happen) */
 	if (list_length(args) != 2)
@@ -4925,25 +4923,29 @@ matching_restriction_variables(PlannerInfo *root, List *args, int varRelid)
 	left = (Node *) linitial(args);
 	right = (Node *) lsecond(args);
 
-	/*
-	 * Examine both sides.  Note that when varRelid is nonzero, Vars of other
-	 * relations will be treated as pseudoconstants.
-	 */
-	examine_variable(root, left, varRelid, &ldata);
-	examine_variable(root, right, varRelid, &rdata);
+	/* Look inside any binary-compatible relabeling */
+
+	if (IsA(left, RelabelType))
+		left = (Node *) ((RelabelType *) left)->arg;
+
+	if (IsA(right, RelabelType))
+		right = (Node *) ((RelabelType *) right)->arg;
+
+	/* We only care about simple Vars from our relation for now */
+	if (!IsA(left, Var) || !IsA(right, Var))
+		return false;
+
+	lvar = (Var *) left;
 
 	/*
-	 * If both sides are variable, and are equal, we win.
+	 * We only check one variable, because we call equals() later, which checks
+	 * the other variable automatically.
 	 */
-	if ((ldata.rel != NULL && rdata.rel != NULL) &&
-		equal(ldata.var, rdata.var))
-		res = true;
-
-	/* We don't need the stats. */
-	ReleaseVariableStats(ldata);
-	ReleaseVariableStats(rdata);
+	if ((varRelid != 0) && (varRelid != lvar->varno))
+		return false;
 
-	return res;
+	/* The two variables need to match */
+	return equal(left, right);
 }
 
 /*
-- 
2.31.1

0003-relax-the-restrictions.patchtext/x-patch; charset=UTF-8; name=0003-relax-the-restrictions.patchDownload
From d1b54830eea71b6afb465d4c86d9a4a92742dc98 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 13 Dec 2021 02:48:58 +0100
Subject: [PATCH 3/5] relax the restrictions

---
 src/backend/utils/adt/selfuncs.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 1031a0fd9e..6b0dde66bf 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4914,7 +4914,7 @@ matching_restriction_variables(PlannerInfo *root, List *args, int varRelid)
 {
 	Node	   *left,
 			   *right;
-	Var		   *lvar;
+	Bitmapset  *varnos;
 
 	/* Fail if not a binary opclause (probably shouldn't happen) */
 	if (list_length(args) != 2)
@@ -4931,17 +4931,24 @@ matching_restriction_variables(PlannerInfo *root, List *args, int varRelid)
 	if (IsA(right, RelabelType))
 		right = (Node *) ((RelabelType *) right)->arg;
 
-	/* We only care about simple Vars from our relation for now */
-	if (!IsA(left, Var) || !IsA(right, Var))
+	/*
+	 * Check if it's safe to compare the two expressions. The expressions
+	 * must not contain any volatile expressions, and must belong to the
+	 * same relation.
+	 *
+	 * XXX We do a small trick - we validate just one expression, and then
+	 * check it's equal to the other side.
+	 */
+	if (contain_volatile_functions(left))
 		return false;
 
-	lvar = (Var *) left;
+	/* Check it belongs to a single relation. */
+	varnos = pull_varnos(root, left);
 
-	/*
-	 * We only check one variable, because we call equals() later, which checks
-	 * the other variable automatically.
-	 */
-	if ((varRelid != 0) && (varRelid != lvar->varno))
+	if (bms_num_members(varnos) != 1)
+		return false;
+
+	if ((varRelid != 0) && (!bms_is_member(varRelid, varnos)))
 		return false;
 
 	/* The two variables need to match */
-- 
2.31.1

0004-main-patch.patchtext/x-patch; charset=UTF-8; name=0004-main-patch.patchDownload
From 1f6949e9eadcb961738e4314414682e179462d16 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 15 Aug 2021 13:23:13 +0200
Subject: [PATCH 4/5] main patch

---
 src/backend/optimizer/path/clausesel.c        |  37 +++-
 src/backend/statistics/extended_stats.c       |  83 ++++++---
 src/backend/statistics/mcv.c                  | 172 +++++++++++++-----
 .../statistics/extended_stats_internal.h      |   4 +-
 src/test/regress/expected/stats_ext.out       |  96 ++++++++++
 src/test/regress/sql/stats_ext.sql            |  26 +++
 6 files changed, 341 insertions(+), 77 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d263ecf082..6a7e9ceea5 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -714,6 +714,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
 	bool		cacheable = false;
+	Node	   *src = clause;
 
 	if (clause == NULL)			/* can this still happen? */
 		return s1;
@@ -871,11 +872,37 @@ clause_selectivity_ext(PlannerInfo *root,
 		}
 		else
 		{
-			/* Estimate selectivity for a restriction clause. */
-			s1 = restriction_selectivity(root, opno,
-										 opclause->args,
-										 opclause->inputcollid,
-										 varRelid);
+			/*
+			 * It might be a single (Expr op Expr) clause, which goes here due
+			 * to the optimization at the beginning of clauselist_selectivity.
+			 * So we try applying extended stats first, and then fall back to
+			 * restriction_selectivity.
+			 */
+			bool	estimated = false;
+
+			if (use_extended_stats)
+			{
+				List	   *clauses = list_make1(src);
+				RelOptInfo *rel = find_single_rel_for_clauses(root, clauses);
+
+				if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+				{
+					Bitmapset  *estimatedclauses = NULL;
+
+					s1 = statext_clauselist_selectivity(root, clauses, varRelid,
+														jointype, sjinfo, rel,
+														&estimatedclauses, false);
+
+					estimated = (bms_num_members(estimatedclauses) == 1);
+				}
+			}
+
+			/* Estimate selectivity for a restriction clause (fallback). */
+			if (!estimated)
+				s1 = restriction_selectivity(root, opno,
+											 opclause->args,
+											 opclause->inputcollid,
+											 varRelid);
 		}
 
 		/*
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 69ca52094f..e1fcfd0711 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1351,19 +1351,27 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		return true;
 	}
 
-	/* (Var/Expr op Const) or (Const op Var/Expr) */
+	/*
+	 * Three opclause variants are supported: (Expr op Const), (Const op Expr),
+	 * (Expr op Expr). That means we may need to analyze one or two expressions
+	 * to make sure the opclause is compatible with extended stats.
+	 */
 	if (is_opclause(clause))
 	{
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		OpExpr	   *expr = (OpExpr *) clause;
-		Node	   *clause_expr;
+		ListCell   *lc;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
-		/* Check if the expression has the right shape */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		/*
+		 * Check if the expression has the right shape. This returns either one
+		 * or two expressions, depending on whether there is a Const.
+		 */
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
 		/*
@@ -1403,13 +1411,31 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			!get_func_leakproof(get_opcode(expr->opno)))
 			return false;
 
-		/* Check (Var op Const) or (Const op Var) clauses by recursing. */
-		if (IsA(clause_expr, Var))
-			return statext_is_compatible_clause_internal(root, clause_expr,
-														 relid, attnums, exprs);
+		/*
+		 * There's always at least one expression, otherwise the clause would
+		 * not be considered compatible.
+		 */
+		Assert(list_length(clause_exprs) >= 1);
+
+		/*
+		 * Check all expressions by recursing. Var expressions are handled as
+		 * a special case (to match it to attnums etc.)
+		 */
+		foreach (lc, clause_exprs)
+		{
+			Node *clause_expr = (Node *) lfirst(lc);
+
+			if (IsA(clause_expr, Var))
+			{
+				/* if the Var is incompatible, the whole clause is incompatible */
+				if (!statext_is_compatible_clause_internal(root, clause_expr,
+														   relid, attnums, exprs))
+					return false;
+			}
+			else	/* generic expression */
+				*exprs = lappend(*exprs, clause_expr);
+		}
 
-		/* Otherwise we have (Expr op Const) or (Const op Expr). */
-		*exprs = lappend(*exprs, clause_expr);
 		return true;
 	}
 
@@ -1419,15 +1445,21 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) clause;
 		Node	   *clause_expr;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
+		/* There has to be one expression exactly. */
+		Assert(list_length(clause_exprs) == 1);
+
+		clause_expr = (Node *) linitial(clause_exprs);
+
 		/*
 		 * If it's not one of the supported operators ("=", "<", ">", etc.),
 		 * just ignore the clause, as it's not compatible with MCV lists.
@@ -2013,20 +2045,19 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
  * examine_opclause_args
  *		Split an operator expression's arguments into Expr and Const parts.
  *
- * Attempts to match the arguments to either (Expr op Const) or (Const op
- * Expr), possibly with a RelabelType on top. When the expression matches this
- * form, returns true, otherwise returns false.
+ * Attempts to match the arguments to either (Expr op Const) or (Const op Expr)
+ * or (Expr op Expr), possibly with a RelabelType on top. When the expression
+ * matches this form, returns true, otherwise returns false.
  *
  * Optionally returns pointers to the extracted Expr/Const nodes, when passed
- * non-null pointers (exprp, cstp and expronleftp). The expronleftp flag
+ * non-null pointers (exprsp, cstp and expronleftp). The expronleftp flag
  * specifies on which side of the operator we found the expression node.
  */
 bool
-examine_opclause_args(List *args, Node **exprp, Const **cstp,
-					  bool *expronleftp)
+examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp)
 {
-	Node	   *expr;
-	Const	   *cst;
+	List	   *exprs = NIL;
+	Const	   *cst = NULL;
 	bool		expronleft;
 	Node	   *leftop,
 			   *rightop;
@@ -2046,22 +2077,26 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 
 	if (IsA(rightop, Const))
 	{
-		expr = (Node *) leftop;
+		exprs = lappend(exprs, leftop);
 		cst = (Const *) rightop;
 		expronleft = true;
 	}
 	else if (IsA(leftop, Const))
 	{
-		expr = (Node *) rightop;
+		exprs = lappend(exprs, rightop);
 		cst = (Const *) leftop;
 		expronleft = false;
 	}
 	else
-		return false;
+	{
+		exprs = lappend(exprs, leftop);
+		exprs = lappend(exprs, rightop);
+		expronleft = false;
+	}
 
 	/* return pointers to the extracted parts if requested */
-	if (exprp)
-		*exprp = expr;
+	if (exprsp)
+		*exprsp = exprs;
 
 	if (cstp)
 		*cstp = cst;
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index b350fc5f7b..ac7d7b8978 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1644,78 +1644,154 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			Node	   *clause_expr2;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
-			int			idx;
 			Oid			collid;
 
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
-			/* match the attribute/expression to a dimension of the statistic */
-			idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+			if (cst)	/* Expr op Const */
+			{
+				int idx;
 
-			Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert(list_length(clause_exprs) == 1);
+				clause_expr = (Node *) linitial(clause_exprs);
 
-			/*
-			 * Walk through the MCV items and evaluate the current clause. We
-			 * can skip items that were already ruled out, and terminate if
-			 * there are no remaining MCV items that might possibly match.
-			 */
-			for (i = 0; i < mcvlist->nitems; i++)
-			{
-				bool		match = true;
-				MCVItem    *item = &mcvlist->items[i];
+				/* match the attribute/expression to a dimension of the statistic */
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
 
-				Assert(idx >= 0);
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * When the MCV item or the Const value is NULL we can treat
-				 * this as a mismatch. We must not call the operator because
-				 * of strictness.
+				 * Walk through the MCV items and evaluate the current clause. We
+				 * can skip items that were already ruled out, and terminate if
+				 * there are no remaining MCV items that might possibly match.
 				 */
-				if (item->isnull[idx] || cst->constisnull)
+				for (i = 0; i < mcvlist->nitems; i++)
 				{
-					matches[i] = RESULT_MERGE(matches[i], is_or, false);
-					continue;
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					Assert(idx >= 0);
+
+					/*
+					 * When the MCV item or the Const value is NULL we can treat
+					 * this as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || cst->constisnull)
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap. Once
+					 * the value gets false for AND-lists, or true for OR-lists,
+					 * we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics, but
+					 * we can use the collation for the attribute itself, as
+					 * stored in varcollid. We do reset the statistics after a
+					 * type change (including collation change), so this is OK.
+					 * For expressions, we use the collation extracted from the
+					 * expression itself.
+					 */
+					if (expronleft)
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   item->values[idx],
+															   cst->constvalue));
+					else
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   cst->constvalue,
+															   item->values[idx]));
+
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
 				}
+			}
+			else	/* Expr op Expr */
+			{
+				int			idx;
+				int			idx2;
+
+				Assert(list_length(clause_exprs) == 2);
+
+				clause_expr = (Node *) linitial(clause_exprs);
+				clause_expr2 = (Node *) lsecond(clause_exprs);
+
+				Assert(clause_expr2);
+				Assert(!expronleft);
 
 				/*
-				 * Skip MCV items that can't change result in the bitmap. Once
-				 * the value gets false for AND-lists, or true for OR-lists,
-				 * we don't need to look at more clauses.
+				 * Match the expressions to a dimension of the statistic.
+				 *
+				 * XXX Can the collations differ?
 				 */
-				if (RESULT_IS_FINAL(matches[i], is_or))
-					continue;
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+				idx2 = mcv_match_expression(clause_expr2, keys, exprs, &collid);
+
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert((idx2 >= 0) && (idx2 < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * First check whether the constant is below the lower
-				 * boundary (in that case we can skip the bucket, because
-				 * there's no overlap).
-				 *
-				 * We don't store collations used to build the statistics, but
-				 * we can use the collation for the attribute itself, as
-				 * stored in varcollid. We do reset the statistics after a
-				 * type change (including collation change), so this is OK.
-				 * For expressions, we use the collation extracted from the
-				 * expression itself.
+				 * Walk through the MCV items and evaluate the current clause.
+				 * We can skip items that were already ruled out, and
+				 * terminate if there are no remaining MCV items that might
+				 * possibly match.
 				 */
-				if (expronleft)
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					/*
+					 * When either of the MCV items is NULL we can treat this
+					 * as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || item->isnull[idx2])
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap.
+					 * Once the value gets false for AND-lists, or true for
+					 * OR-lists, we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics,
+					 * but we can use the collation for the attribute itself,
+					 * as stored in varcollid. We do reset the statistics after
+					 * a type change (including collation change), so this is
+					 * OK. We may need to relax this after allowing extended
+					 * statistics on expressions.
+					 */
 					match = DatumGetBool(FunctionCall2Coll(&opproc,
 														   collid,
 														   item->values[idx],
-														   cst->constvalue));
-				else
-					match = DatumGetBool(FunctionCall2Coll(&opproc,
-														   collid,
-														   cst->constvalue,
-														   item->values[idx]));
+														   item->values[idx2]));
 
-				/* update the match bitmap with the result */
-				matches[i] = RESULT_MERGE(matches[i], is_or, match);
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
+				}
 			}
 		}
 		else if (IsA(clause, ScalarArrayOpExpr))
@@ -1725,6 +1801,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
 			Oid			collid;
@@ -1742,11 +1819,14 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
 			/* ScalarArrayOpExpr has the Var always on the left */
 			Assert(expronleft);
+			Assert(list_length(clause_exprs) == 1);
+
+			clause_expr = (Node *) linitial(clause_exprs);
 
 			/* XXX what if (cst->constisnull == NULL)? */
 			if (!cst->constisnull)
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 55cd9252a5..1f30fa9060 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -99,8 +99,8 @@ extern SortItem *build_sorted_items(StatsBuildData *data, int *nitems,
 									MultiSortSupport mss,
 									int numattrs, AttrNumber *attnums);
 
-extern bool examine_opclause_args(List *args, Node **exprp,
-								  Const **cstp, bool *expronleftp);
+extern bool examine_opclause_args(List *args, List **exprs, Const **cstp,
+								  bool *expronleftp);
 
 extern Selectivity mcv_combine_selectivities(Selectivity simple_sel,
 											 Selectivity mcv_sel,
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index c60ba45aba..cabc97c43b 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1917,6 +1917,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        343 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      1667 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2064,6 +2076,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        200 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      3750 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2473,6 +2497,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       3750 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+        25 |   2500
+(1 row)
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 ANALYZE mcv_lists;
@@ -2506,6 +2536,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       2500 |   2500
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -2600,6 +2636,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
       1094 |      0
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      9950 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 ANALYZE mcv_lists_bool;
@@ -2755,6 +2803,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 (1 row)
 
 DROP TABLE mcv_lists_partial;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -2814,6 +2874,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       2649 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+         1 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
 CREATE STATISTICS mcv_lists_multi_2 (mcv) ON c, d FROM mcv_lists_multi;
@@ -2860,6 +2938,24 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       1571 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
 DROP TABLE mcv_lists_multi;
 -- statistics on integer expressions
 CREATE TABLE expr_stats (a int, b int, c int);
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 6fb37962a7..9210f144cb 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -956,6 +956,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -1009,6 +1013,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -1208,6 +1216,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 
@@ -1226,6 +1236,8 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -1301,6 +1313,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 
@@ -1386,6 +1402,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 
 DROP TABLE mcv_lists_partial;
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -1413,6 +1433,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
@@ -1427,6 +1450,9 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE b = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
 
 DROP TABLE mcv_lists_multi;
 
-- 
2.31.1

0005-Don-t-treat-Var-op-Var-as-simple-clauses.patchtext/x-patch; charset=UTF-8; name=0005-Don-t-treat-Var-op-Var-as-simple-clauses.patchDownload
From 87fb7e95f6694c7f9f79c28fc660211af9cfa15b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Wed, 18 Aug 2021 11:53:45 +0200
Subject: [PATCH 5/5] Don't treat Var op Var as simple clauses

---
 src/backend/statistics/extended_stats.c       | 86 +++++++++++++++----
 src/backend/statistics/mcv.c                  |  4 +-
 .../statistics/extended_stats_internal.h      |  2 +-
 3 files changed, 74 insertions(+), 18 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index e1fcfd0711..bf4aa682ae 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1319,11 +1319,14 @@ choose_best_statistics(List *stats, char requiredkind,
  * statext_is_compatible_clause. It needs to be split like this because
  * of recursion.  The attnums bitmap is an input/output parameter collecting
  * attribute numbers from all compatible clauses (recursively).
+ *
+ * XXX The issimple variable is expected to be initialized by the caller, we
+ * just update it while recursively analyzing the current clause.
  */
 static bool
 statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 									  Index relid, Bitmapset **attnums,
-									  List **exprs)
+									  List **exprs, bool *issimple)
 {
 	/* Look inside any binary-compatible relabeling (as in examine_variable) */
 	if (IsA(clause, RelabelType))
@@ -1371,7 +1374,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		 * Check if the expression has the right shape. This returns either one
 		 * or two expressions, depending on whether there is a Const.
 		 */
-		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL, issimple))
 			return false;
 
 		/*
@@ -1420,6 +1423,12 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		/*
 		 * Check all expressions by recursing. Var expressions are handled as
 		 * a special case (to match it to attnums etc.)
+		 *
+		 * An opclause is simple if it's (Expr op Const) or (Const op Expr). We
+		 * have already checked the overall shape in examine_opclause_args, but
+		 * we haven't checked the expressions are simple (i.e. pretty much Var),
+		 * so we need to check that now. If we discover a complex expression, we
+		 * consider the whole clause complex.
 		 */
 		foreach (lc, clause_exprs)
 		{
@@ -1429,11 +1438,17 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			{
 				/* if the Var is incompatible, the whole clause is incompatible */
 				if (!statext_is_compatible_clause_internal(root, clause_expr,
-														   relid, attnums, exprs))
+														   relid, attnums, exprs,
+														   issimple))
 					return false;
 			}
 			else	/* generic expression */
+			{
 				*exprs = lappend(*exprs, clause_expr);
+
+				/* switch to false if there are any complex clauses */
+				*issimple = false;
+			}
 		}
 
 		return true;
@@ -1452,7 +1467,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL, issimple))
 			return false;
 
 		/* There has to be one expression exactly. */
@@ -1500,7 +1515,8 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		/* Check Var IN Array clauses by recursing. */
 		if (IsA(clause_expr, Var))
 			return statext_is_compatible_clause_internal(root, clause_expr,
-														 relid, attnums, exprs);
+														 relid, attnums, exprs,
+														 issimple);
 
 		/* Otherwise we have Expr IN Array. */
 		*exprs = lappend(*exprs, clause_expr);
@@ -1529,6 +1545,14 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		BoolExpr   *expr = (BoolExpr *) clause;
 		ListCell   *lc;
 
+		/*
+		 * All AND/OR clauses are considered complex, even if all arguments are
+		 * simple clauses. For NOT clauses we need to check the argument and then
+		 * we can update the flag.
+		 */
+		if (!is_notclause(clause))
+			*issimple = false;
+
 		foreach(lc, expr->args)
 		{
 			/*
@@ -1537,7 +1561,8 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			 */
 			if (!statext_is_compatible_clause_internal(root,
 													   (Node *) lfirst(lc),
-													   relid, attnums, exprs))
+													   relid, attnums, exprs,
+													   issimple))
 				return false;
 		}
 
@@ -1552,7 +1577,8 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		/* Check Var IS NULL clauses by recursing. */
 		if (IsA(nt->arg, Var))
 			return statext_is_compatible_clause_internal(root, (Node *) (nt->arg),
-														 relid, attnums, exprs);
+														 relid, attnums, exprs,
+														 issimple);
 
 		/* Otherwise we have Expr IS NULL. */
 		*exprs = lappend(*exprs, nt->arg);
@@ -1588,22 +1614,35 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
  */
 static bool
 statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
-							 Bitmapset **attnums, List **exprs)
+							 Bitmapset **attnums, List **exprs, bool *issimple)
 {
 	RangeTblEntry *rte = root->simple_rte_array[relid];
 	RestrictInfo *rinfo = (RestrictInfo *) clause;
 	int			clause_relid;
 	Oid			userid;
 
+	/*
+	 * Clauses are considered simple by default, and we mark them as complex
+	 * when we discover a complex part.
+	 */
+	*issimple = true;
+
 	/*
 	 * Special-case handling for bare BoolExpr AND clauses, because the
 	 * restrictinfo machinery doesn't build RestrictInfos on top of AND
 	 * clauses.
+	 *
+	 * AND clauses are considered complex, even if all arguments are
+	 * simple clauses.
 	 */
 	if (is_andclause(clause))
 	{
 		BoolExpr   *expr = (BoolExpr *) clause;
 		ListCell   *lc;
+		bool		tmp = false;	/* ignored result */
+
+		/* AND clauses are complex, even if the arguments are simple. */
+		*issimple = false;
 
 		/*
 		 * Check that each sub-clause is compatible.  We expect these to be
@@ -1612,7 +1651,7 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 		foreach(lc, expr->args)
 		{
 			if (!statext_is_compatible_clause(root, (Node *) lfirst(lc),
-											  relid, attnums, exprs))
+											  relid, attnums, exprs, &tmp))
 				return false;
 		}
 
@@ -1634,7 +1673,7 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 
 	/* Check the clause and determine what attributes it references. */
 	if (!statext_is_compatible_clause_internal(root, (Node *) rinfo->clause,
-											   relid, attnums, exprs))
+											   relid, attnums, exprs, issimple))
 		return false;
 
 	/*
@@ -1724,6 +1763,7 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 	ListCell   *l;
 	Bitmapset **list_attnums;	/* attnums extracted from the clause */
 	List	  **list_exprs;		/* expressions matched to any statistic */
+	bool	  *list_simple;		/* marks simple expressions */
 	int			listidx;
 	Selectivity sel = (is_or) ? 0.0 : 1.0;
 
@@ -1737,6 +1777,9 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 	/* expressions extracted from complex expressions */
 	list_exprs = (List **) palloc(sizeof(Node *) * list_length(clauses));
 
+	/* expressions determined to be simple (single expression) */
+	list_simple = (bool *) palloc(sizeof(bool) * list_length(clauses));
+
 	/*
 	 * Pre-process the clauses list to extract the attnums and expressions
 	 * seen in each item.  We need to determine if there are any clauses which
@@ -1754,17 +1797,21 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 		Node	   *clause = (Node *) lfirst(l);
 		Bitmapset  *attnums = NULL;
 		List	   *exprs = NIL;
+		bool		issimple = false;
 
 		if (!bms_is_member(listidx, *estimatedclauses) &&
-			statext_is_compatible_clause(root, clause, rel->relid, &attnums, &exprs))
+			statext_is_compatible_clause(root, clause, rel->relid,
+										 &attnums, &exprs, &issimple))
 		{
 			list_attnums[listidx] = attnums;
 			list_exprs[listidx] = exprs;
+			list_simple[listidx] = issimple;
 		}
 		else
 		{
 			list_attnums[listidx] = NULL;
 			list_exprs[listidx] = NIL;
+			list_simple[listidx] = false;
 		}
 
 		listidx++;
@@ -1801,6 +1848,8 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 		listidx = -1;
 		foreach(l, clauses)
 		{
+			Node *clause = (Node *) lfirst(l);
+
 			/* Increment the index before we decide if to skip the clause. */
 			listidx++;
 
@@ -1839,13 +1888,12 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 			/* record simple clauses (single column or expression) */
 			if ((list_attnums[listidx] == NULL &&
 				 list_length(list_exprs[listidx]) == 1) ||
-				(list_exprs[listidx] == NIL &&
-				 bms_membership(list_attnums[listidx]) == BMS_SINGLETON))
+				 list_simple[listidx])
 				simple_clauses = bms_add_member(simple_clauses,
 												list_length(stat_clauses));
 
 			/* add clause to list and mark it as estimated */
-			stat_clauses = lappend(stat_clauses, (Node *) lfirst(l));
+			stat_clauses = lappend(stat_clauses, clause);
 			*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
 
 			/*
@@ -2054,13 +2102,15 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
  * specifies on which side of the operator we found the expression node.
  */
 bool
-examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp)
+examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp,
+					  bool *issimplep)
 {
 	List	   *exprs = NIL;
 	Const	   *cst = NULL;
 	bool		expronleft;
 	Node	   *leftop,
 			   *rightop;
+	bool		issimple;
 
 	/* enforced by statext_is_compatible_clause_internal */
 	Assert(list_length(args) == 2);
@@ -2080,18 +2130,21 @@ examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp
 		exprs = lappend(exprs, leftop);
 		cst = (Const *) rightop;
 		expronleft = true;
+		issimple = true;
 	}
 	else if (IsA(leftop, Const))
 	{
 		exprs = lappend(exprs, rightop);
 		cst = (Const *) leftop;
 		expronleft = false;
+		issimple = true;
 	}
 	else
 	{
 		exprs = lappend(exprs, leftop);
 		exprs = lappend(exprs, rightop);
 		expronleft = false;
+		issimple = false;
 	}
 
 	/* return pointers to the extracted parts if requested */
@@ -2104,6 +2157,9 @@ examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp
 	if (expronleftp)
 		*expronleftp = expronleft;
 
+	if (issimplep)
+		*issimplep = (*issimplep && issimple);
+
 	return true;
 }
 
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index ac7d7b8978..eb3267b164 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1653,7 +1653,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft, NULL))
 				elog(ERROR, "incompatible clause");
 
 			if (cst)	/* Expr op Const */
@@ -1819,7 +1819,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft, NULL))
 				elog(ERROR, "incompatible clause");
 
 			/* ScalarArrayOpExpr has the Var always on the left */
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 1f30fa9060..d86cc4184b 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -100,7 +100,7 @@ extern SortItem *build_sorted_items(StatsBuildData *data, int *nitems,
 									int numattrs, AttrNumber *attnums);
 
 extern bool examine_opclause_args(List *args, List **exprs, Const **cstp,
-								  bool *expronleftp);
+								  bool *expronleftp, bool *issimplep);
 
 extern Selectivity mcv_combine_selectivities(Selectivity simple_sel,
 											 Selectivity mcv_sel,
-- 
2.31.1

#43Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#41)
Re: Use extended statistics to estimate (Var op Var) clauses

On Sun, Dec 12, 2021 at 6:04 PM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

On 8/31/21 00:14, Zhihong Yu wrote:

Hi,
For patch 0002,

+                   s1 = statext_clauselist_selectivity(root, clauses,
varRelid,
+                                                       jointype,
sjinfo, rel,
+
&estimatedclauses, false);
+
+                   estimated = (bms_num_members(estimatedclauses) == 1);

I took a look at clauselist_apply_dependencies() (called by
statext_clauselist_selectivity) where estimatedclauses is modified.
Since the caller would not use the returned Selectivity if number of
elements in estimatedclauses is greater than 1, I wonder
if a parameter can be added to clauselist_apply_dependencies() which
indicates early return if the second element is added

to estimatedclauses.

Hmmm, I'm not sure I understand your point. Are you suggesting there's a
bug in not updating the bitmap, or would this be an optimization? Can
you give an example of a query affected by this?

Hi,

My previous comment was from 3 months ago - let me see if I can come up
with an example.

Cheers

Show quoted text

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#44Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tomas Vondra (#42)
Re: Use extended statistics to estimate (Var op Var) clauses

On Dec 12, 2021, at 6:21 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

<0001-Improve-estimates-for-Var-op-Var-with-the-same-Var.patch>

+ * It it's (variable = variable) with the same variable on both sides, it's

s/It it's/If it's/

0001 lacks regression coverage.

<0002-simplification.patch>

Changing comments introduced by patch 0001 in patch 0002 just creates git churn:

-    * estimate the selectivity as 1.0 (or 0.0 if it's negated).
+    * estimate the selectivity as 1.0 (or 0.0 when it's negated).

and:

  * matching_restriction_variable
- *     Examine the args of a restriction clause to see if it's of the
- *     form (variable op variable) with the same variable on both sides.
+ *     Check if the two arguments of a restriction clause refer to the same
+ *     variable, i.e. if the condition is of the form (variable op variable).
+ *     We can deduce selectivity for such (in)equality clauses.

0002 also lacks regression coverage.

<0003-relax-the-restrictions.patch>

0003 also lacks regression coverage.

<0004-main-patch.patch>

Ok.

<0005-Don-t-treat-Var-op-Var-as-simple-clauses.patch>

0005 again lacks regression coverage.

There might be a problem in how selectivity thinks about comparison between identical columns from the NEW and OLD pseudotables. To show this, add an Assert to see where matching_restriction_variables() might return true:

--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4952,6 +4952,8 @@ matching_restriction_variables(PlannerInfo *root, List *args, int varRelid)
                return false;
        /* The two variables need to match */
+       Assert(!equal(left, right));
+
        return equal(left, right);

This results in the regression tests failing on "update rtest_emp set ename = 'wiecx' where ename = 'wiecc';". It may seem counterintuitive that matching_restriction_variables() would return true for a where-clause with only one occurrence of variable "ename", until you read the rule defined in rules.sql:

create rule rtest_emp_upd as on update to rtest_emp where new.salary != old.salary do
insert into rtest_emplog values (new.ename, current_user,
'honored', new.salary, old.salary);

I think what's really happening here is that "new.salary != old.salary" is being processed by matching_restriction_variables() and returning that the new.salary refers to the same thing that old.salary refers to.

Here is the full stack trace, for reference:

(lldb) bt
* thread #1, stop reason = signal SIGSTOP
frame #0: 0x00007fff6d8fd33a libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff6d9b9e60 libsystem_pthread.dylib`pthread_kill + 430
frame #2: 0x00007fff6d884808 libsystem_c.dylib`abort + 120
frame #3: 0x00000001048a6c31 postgres`ExceptionalCondition(conditionName="!equal(left, right)", errorType="FailedAssertion", fileName="selfuncs.c", lineNumber=4955) at assert.c:69:2
frame #4: 0x000000010481e733 postgres`matching_restriction_variables(root=0x00007fe65e02b2d0, args=0x00007fe65e02bf38, varRelid=0) at selfuncs.c:4955:2
frame #5: 0x000000010480f63c postgres`eqsel_internal(fcinfo=0x00007ffeebb9aeb8, negate=true) at selfuncs.c:265:6
frame #6: 0x000000010481040a postgres`neqsel(fcinfo=0x00007ffeebb9aeb8) at selfuncs.c:565:2
frame #7: 0x00000001048b420c postgres`FunctionCall4Coll(flinfo=0x00007ffeebb9af38, collation=0, arg1=140627396440784, arg2=901, arg3=140627396443960, arg4=0) at fmgr.c:1212:11
frame #8: 0x00000001048b50e6 postgres`OidFunctionCall4Coll(functionId=102, collation=0, arg1=140627396440784, arg2=901, arg3=140627396443960, arg4=0) at fmgr.c:1448:9
* frame #9: 0x0000000104557e4d postgres`restriction_selectivity(root=0x00007fe65e02b2d0, operatorid=901, args=0x00007fe65e02bf38, inputcollid=0, varRelid=0) at plancat.c:1828:26
frame #10: 0x00000001044d4c76 postgres`clause_selectivity_ext(root=0x00007fe65e02b2d0, clause=0x00007fe65e02bfe8, varRelid=0, jointype=JOIN_INNER, sjinfo=0x0000000000000000, use_extended_stats=true) at clausesel.c:902:10
frame #11: 0x00000001044d4186 postgres`clauselist_selectivity_ext(root=0x00007fe65e02b2d0, clauses=0x00007fe65e02cdd0, varRelid=0, jointype=JOIN_INNER, sjinfo=0x0000000000000000, use_extended_stats=true) at clausesel.c:185:8
frame #12: 0x00000001044d3f97 postgres`clauselist_selectivity(root=0x00007fe65e02b2d0, clauses=0x00007fe65e02cdd0, varRelid=0, jointype=JOIN_INNER, sjinfo=0x0000000000000000) at clausesel.c:108:9
frame #13: 0x00000001044de192 postgres`set_baserel_size_estimates(root=0x00007fe65e02b2d0, rel=0x00007fe65e01e640) at costsize.c:4931:3
frame #14: 0x00000001044d16be postgres`set_plain_rel_size(root=0x00007fe65e02b2d0, rel=0x00007fe65e01e640, rte=0x00007fe65e02ac40) at allpaths.c:584:2
frame #15: 0x00000001044d0a79 postgres`set_rel_size(root=0x00007fe65e02b2d0, rel=0x00007fe65e01e640, rti=1, rte=0x00007fe65e02ac40) at allpaths.c:413:6
frame #16: 0x00000001044cd990 postgres`set_base_rel_sizes(root=0x00007fe65e02b2d0) at allpaths.c:324:3
frame #17: 0x00000001044cd660 postgres`make_one_rel(root=0x00007fe65e02b2d0, joinlist=0x00007fe65e02ccd8) at allpaths.c:186:2
frame #18: 0x0000000104511b3b postgres`query_planner(root=0x00007fe65e02b2d0, qp_callback=(postgres`standard_qp_callback at planner.c:3071), qp_extra=0x00007ffeebb9b648) at planmain.c:276:14
frame #19: 0x0000000104514dfc postgres`grouping_planner(root=0x00007fe65e02b2d0, tuple_fraction=0) at planner.c:1448:17
frame #20: 0x00000001045138fb postgres`subquery_planner(glob=0x00007fe66e010cb0, parse=0x00007fe66e0107a8, parent_root=0x0000000000000000, hasRecursion=false, tuple_fraction=0) at planner.c:1025:2
frame #21: 0x0000000104511faa postgres`standard_planner(parse=0x00007fe66e0107a8, query_string="update rtest_emp set ename = 'wiecx' where ename = 'wiecc';", cursorOptions=2048, boundParams=0x0000000000000000) at planner.c:406:9
frame #22: 0x0000000104511d3a postgres`planner(parse=0x00007fe66e0107a8, query_string="update rtest_emp set ename = 'wiecx' where ename = 'wiecc';", cursorOptions=2048, boundParams=0x0000000000000000) at planner.c:277:12
frame #23: 0x00000001046957fb postgres`pg_plan_query(querytree=0x00007fe66e0107a8, query_string="update rtest_emp set ename = 'wiecx' where ename = 'wiecc';", cursorOptions=2048, boundParams=0x0000000000000000) at postgres.c:848:9
frame #24: 0x00000001046959d2 postgres`pg_plan_queries(querytrees=0x00007fe65e02b170, query_string="update rtest_emp set ename = 'wiecx' where ename = 'wiecc';", cursorOptions=2048, boundParams=0x0000000000000000) at postgres.c:940:11
frame #25: 0x0000000104698d0d postgres`exec_simple_query(query_string="update rtest_emp set ename = 'wiecx' where ename = 'wiecc';") at postgres.c:1134:19
frame #26: 0x0000000104697f4c postgres`PostgresMain(dbname="regression", username="mark.dilger") at postgres.c:4498:7
frame #27: 0x000000010459b859 postgres`BackendRun(port=0x00007fe65dc043e0) at postmaster.c:4594:2
frame #28: 0x000000010459adba postgres`BackendStartup(port=0x00007fe65dc043e0) at postmaster.c:4322:3
frame #29: 0x0000000104599ade postgres`ServerLoop at postmaster.c:1802:7
frame #30: 0x0000000104597206 postgres`PostmasterMain(argc=8, argv=0x00007fe66dc06300) at postmaster.c:1474:11
frame #31: 0x000000010446be4f postgres`main(argc=8, argv=0x00007fe66dc06300) at main.c:198:3
frame #32: 0x00007fff6d7b5cc9 libdyld.dylib`start + 1

In frame 9, operatorid=901. Checking pg_operator.dat, that's:

{ oid => '901', descr => 'not equal',
oprname => '<>', oprleft => 'money', oprright => 'money', oprresult => 'bool',
oprcom => '<>(money,money)', oprnegate => '=(money,money)',
oprcode => 'cash_ne', oprrest => 'neqsel', oprjoin => 'neqjoinsel' },

Looking at frame 10, restrict_selectivity is being called with clause={OPEXPR :opno 901 :opfuncid 889 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 2 :vartype 790 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 3 :varattnosyn 2 :location -1} {VAR :varno 1 :varattno 2 :vartype 790 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location -1}) :location -1}

Maybe there is some reason this is ok. Any thoughts about it?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#45Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Mark Dilger (#44)
Re: Use extended statistics to estimate (Var op Var) clauses

On Dec 21, 2021, at 4:28 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

Maybe there is some reason this is ok.

... and there is. Sorry for the noise. The planner appears to be smart enough to know that column "salary" is not being changed, and therefore NEW.salary and OLD.salary are equal. If I test a different update statement that contains a new value for "salary", the added assertion is not triggered.

(I didn't quite realize what the clause's varnosyn field was telling me until after I hit "send".)


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#46Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tomas Vondra (#42)
Re: Use extended statistics to estimate (Var op Var) clauses

On Mon, 13 Dec 2021 at 02:21, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

I finally got around to this patch again, focusing mostly on the first
part that simply returns either 1.0 or 0.0 for Var op Var conditions
(i.e. the part not really using extended statistics).

Just starting to look at this again, starting with 0001 ...

This needs to account for nullfrac, since x = x is only true if x is not null.

I don't like how matching_restriction_variables() is adding a
non-trivial amount of overhead to each of these selectivity functions,
by calling examine_variable() for each side, duplicating what
get_restriction_variable() does later on.

I think it would probably be better to make the changes in
var_eq_non_const(), where all of this information is available, and
get rid of matching_restriction_variables() (it doesn't look like the
new scalarineqsel_wrapper() code really needs
matching_restriction_variables() - it can just use what
get_restriction_variable() already returns).

Regards,
Dean

#47Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Dean Rasheed (#46)
5 attachment(s)
Re: Use extended statistics to estimate (Var op Var) clauses

On 7/9/22 14:04, Dean Rasheed wrote:

On Mon, 13 Dec 2021 at 02:21, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

I finally got around to this patch again, focusing mostly on the first
part that simply returns either 1.0 or 0.0 for Var op Var conditions
(i.e. the part not really using extended statistics).

Just starting to look at this again, starting with 0001 ...

This needs to account for nullfrac, since x = x is only true if x is not null.

Right, I forgot to account for nullfrac.

I don't like how matching_restriction_variables() is adding a
non-trivial amount of overhead to each of these selectivity functions,
by calling examine_variable() for each side, duplicating what
get_restriction_variable() does later on.

But matching_restriction_variables() does not use examine_variable()
anymore. It did, originally, but 0002 removed all of that. Now it does
just pull_varnos() and that's it. I admit leaving those bits unsquashed
was a bit confusing, but the first part was really 0001+0002+0003.

I think it would probably be better to make the changes in
var_eq_non_const(), where all of this information is available, and
get rid of matching_restriction_variables() (it doesn't look like the
new scalarineqsel_wrapper() code really needs
matching_restriction_variables() - it can just use what
get_restriction_variable() already returns).

I'm not sure how could that help. var_eq_non_const only really applies
to eqsel_internal(), so we'd still need to deal with various other
places for inequality operators.

Attached is a rebased and somewhat cleaned up version of the patch
series, addressing the review comments so far and squashing the bits I
previously kept separate to showcase the changes.

I've also added a bunch of regression tests - queries with (Var op Var)
clauses of varying complexity, to demonstrate the effect of each patch.
I added them as 0001, so it's clear how the individual patches affect
the results.

I've also wrote a single generator that generates both data and queries
with (Var op Var) clauses, and then runs them on multiple connections to
compare the estimates. It requires some effort to run that (setting up
the clusters, ...) but shouldn't be too hard to get it working.

The results are pretty massive (thousands of queries), but a simple
summary showing percentiles (0.25, 0.5, 0.75, 0.9) for estimation error,
calculated as (+1 to deal with 0 actual rows)

abs(estimated - actual) / (actual + 1)

looks like this:

clauses | stats | master | patched
---------+-------+---------------------------+------------------------
1 | no | {0.39, 0.67, 4.17, 10000} | {0.29, 0.67, 1, 10000}
2 | no | {0.38, 0.79, 50, 9950} | {0.26, 0.73, 1, 3333}
3 | no | {0.3, 0.84, 50, 3317} | {0.22, 0.78, 1, 1111}
4 | no | {0.24, 0.84, 25, 1852} | {0.14, 0.78, 1, 370}
5 | no | {0.2, 0.85, 17, 1100} | {0.11, 0.78, 1, 50}
1 | yes | {0.39, 0.67, 4.17, 10000} | {0, 0.14, 1, 1}
2 | yes | {0.38, 0.79, 50, 9950} | {0, 0.15, 1, 1}
3 | yes | {0.3, 0.84, 50, 3317} | {0, 0.15, 1, 1}
4 | yes | {0.24, 0.84, 25, 1852} | {0, 0.17, 1, 1}
5 | yes | {0.2, 0.85, 17, 1100} | {0, 0.14, 1, 1}

This seems pretty good, IMO. Without extended stats on the columns,
there's only so much we can do, but even then the errors got much
smaller. With the stats it's clearly way better.

Of course, take this with a grain of salt - those are randomly generated
synthetic queries, with all queries being considered equally "likely".
But clearly some queries are more likely to appear in the applications,
and those are more important to estimate. However, the point of this was
to see if there are classes of queries that would get much worse, and I
haven't found anything like that.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

generator.pytext/x-python; charset=UTF-8; name=generator.pyDownload
0001-Add-regression-tests-for-Var-op-Var-20220721.patchtext/x-patch; charset=UTF-8; name=0001-Add-regression-tests-for-Var-op-Var-20220721.patchDownload
From 6287b69f931a44c89c20208c5b41f79117d258f2 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Wed, 20 Jul 2022 11:09:46 +0200
Subject: [PATCH 1/4] Add regression tests for (Var op Var)

A number of queries to test estimation of Var op Var clauses. The number
of new tests is likely an overkill, intended to showcase effects of
parts of this patch series.
---
 src/test/regress/expected/stats_ext.out | 2354 +++++++++++++++++++++--
 src/test/regress/sql/stats_ext.sql      |  650 +++++++
 2 files changed, 2847 insertions(+), 157 deletions(-)

diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 042316aeed8..009c7cfd848 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1988,6 +1988,78 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        343 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = a');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a != a');
+ estimated | actual 
+-----------+--------
+      4975 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= a');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < a');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > a');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= a');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = c');
+ estimated | actual 
+-----------+--------
+        25 |   1250
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a != c');
+ estimated | actual 
+-----------+--------
+      4975 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      1667 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= c');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= c');
+ estimated | actual 
+-----------+--------
+      1667 |   1250
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2135,6 +2207,78 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
        200 |    200
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = a');
+ estimated | actual 
+-----------+--------
+      5000 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a != a');
+ estimated | actual 
+-----------+--------
+      4975 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= a');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < a');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > a');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= a');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = c');
+ estimated | actual 
+-----------+--------
+        25 |   1250
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a != c');
+ estimated | actual 
+-----------+--------
+      4975 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+ estimated | actual 
+-----------+--------
+      1667 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= c');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= c');
+ estimated | actual 
+-----------+--------
+      1667 |   1250
+(1 row)
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
  estimated | actual 
 -----------+--------
@@ -2544,288 +2688,1944 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
       3750 |   2500
 (1 row)
 
--- create statistics
-CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
-ANALYZE mcv_lists;
--- test pg_mcv_list_items with MCV list containing variable-length data and NULLs
-SELECT m.*
-  FROM pg_statistic_ext s, pg_statistic_ext_data d,
-       pg_mcv_list_items(d.stxdmcv) m
- WHERE s.stxname = 'mcv_lists_stats'
-   AND d.stxoid = s.oid;
- index |      values      |  nulls  | frequency | base_frequency 
--------+------------------+---------+-----------+----------------
-     0 | {NULL,x,x}       | {t,f,f} |       0.5 |           0.25
-     1 | {NULL,NULL,NULL} | {t,t,t} |       0.5 |           0.25
-(2 rows)
-
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = ''x'' OR d = ''x''');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
  estimated | actual 
 -----------+--------
-      2500 |   2500
+        25 |   2500
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''x'' OR d = ''x''');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a > a');
  estimated | actual 
 -----------+--------
-      2500 |   2500
+         8 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a < a');
  estimated | actual 
 -----------+--------
-      2500 |   2500
+         8 |      0
 (1 row)
 
--- mcv with pass-by-ref fixlen types, e.g. uuid
-CREATE TABLE mcv_lists_uuid (
-    a UUID,
-    b UUID,
-    c UUID
-)
-WITH (autovacuum_enabled = off);
-INSERT INTO mcv_lists_uuid (a, b, c)
-     SELECT
-         md5(mod(i,100)::text)::uuid,
-         md5(mod(i,50)::text)::uuid,
-         md5(mod(i,25)::text)::uuid
-     FROM generate_series(1,5000) s(i);
-ANALYZE mcv_lists_uuid;
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_uuid WHERE a = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND b = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc''');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a = a');
  estimated | actual 
 -----------+--------
-         1 |     50
+         1 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_uuid WHERE a = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND b = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND c = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc''');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT a = a');
  estimated | actual 
 -----------+--------
-         1 |     50
+        25 |      0
 (1 row)
 
-CREATE STATISTICS mcv_lists_uuid_stats (mcv) ON a, b, c
-  FROM mcv_lists_uuid;
-ANALYZE mcv_lists_uuid;
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_uuid WHERE a = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND b = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc''');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a > a');
  estimated | actual 
 -----------+--------
-        50 |     50
+      1683 |   2500
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_uuid WHERE a = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND b = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND c = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc''');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a < a');
  estimated | actual 
 -----------+--------
-        50 |     50
+      1683 |   2500
 (1 row)
 
-DROP TABLE mcv_lists_uuid;
--- mcv with arrays
-CREATE TABLE mcv_lists_arrays (
-    a TEXT[],
-    b NUMERIC[],
-    c INT[]
-)
-WITH (autovacuum_enabled = off);
-INSERT INTO mcv_lists_arrays (a, b, c)
-     SELECT
-         ARRAY[md5((i/100)::text), md5((i/100-1)::text), md5((i/100+1)::text)],
-         ARRAY[(i/100-1)::numeric/1000, (i/100)::numeric/1000, (i/100+1)::numeric/1000],
-         ARRAY[(i/100-1), i/100, (i/100+1)]
-     FROM generate_series(1,5000) s(i);
-CREATE STATISTICS mcv_lists_arrays_stats (mcv) ON a, b, c
-  FROM mcv_lists_arrays;
-ANALYZE mcv_lists_arrays;
--- mcv with bool
-CREATE TABLE mcv_lists_bool (
-    a BOOL,
-    b BOOL,
-    c BOOL
-)
-WITH (autovacuum_enabled = off);
-INSERT INTO mcv_lists_bool (a, b, c)
-     SELECT
-         (mod(i,2) = 0), (mod(i,4) = 0), (mod(i,8) = 0)
-     FROM generate_series(1,10000) s(i);
-ANALYZE mcv_lists_bool;
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE a AND b AND c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a = a');
  estimated | actual 
 -----------+--------
-       156 |   1250
+        50 |   2500
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT a = a');
  estimated | actual 
 -----------+--------
-       156 |      0
+      4975 |   2500
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND NOT b AND c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
  estimated | actual 
 -----------+--------
-       469 |      0
+      4975 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a > a');
  estimated | actual 
 -----------+--------
-      1094 |      0
+      1658 |      0
 (1 row)
 
-CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
-  FROM mcv_lists_bool;
-ANALYZE mcv_lists_bool;
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE a AND b AND c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a < a');
  estimated | actual 
 -----------+--------
-      1250 |   1250
+      1658 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a = a');
  estimated | actual 
 -----------+--------
          1 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND NOT b AND c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT a = a');
  estimated | actual 
 -----------+--------
-         1 |      0
+      4950 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a > a');
  estimated | actual 
 -----------+--------
-         1 |      0
+      4983 |      0
 (1 row)
 
--- mcv covering just a small fraction of data
-CREATE TABLE mcv_lists_partial (
-    a INT,
-    b INT,
-    c INT
-);
--- 10 frequent groups, each with 100 elements
-INSERT INTO mcv_lists_partial (a, b, c)
-     SELECT
-         mod(i,10),
-         mod(i,10),
-         mod(i,10)
-     FROM generate_series(0,999) s(i);
--- 100 groups that will make it to the MCV list (includes the 10 frequent ones)
-INSERT INTO mcv_lists_partial (a, b, c)
-     SELECT
-         i,
-         i,
-         i
-     FROM generate_series(0,99) s(i);
--- 4000 groups in total, most of which won't make it (just a single item)
-INSERT INTO mcv_lists_partial (a, b, c)
-     SELECT
-         i,
-         i,
-         i
-     FROM generate_series(0,3999) s(i);
-ANALYZE mcv_lists_partial;
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 AND b = 0 AND c = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a < a');
  estimated | actual 
 -----------+--------
-         1 |    102
+      4983 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a = a');
  estimated | actual 
 -----------+--------
-       300 |    102
+      4975 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 10 AND b = 10 AND c = 10');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT a = a');
  estimated | actual 
 -----------+--------
-         1 |      2
+      5000 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 10 OR b = 10 OR c = 10');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
  estimated | actual 
 -----------+--------
-         6 |      2
+      1667 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 AND b = 0 AND c = 10');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a > a');
  estimated | actual 
 -----------+--------
-         1 |      0
+       556 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 10');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a < a');
  estimated | actual 
 -----------+--------
-       204 |    104
+       556 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0 AND c = 0) OR (a = 1 AND b = 1 AND c = 1) OR (a = 2 AND b = 2 AND c = 2)');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a = a');
  estimated | actual 
 -----------+--------
-         1 |    306
+         1 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0) OR (a = 0 AND c = 0) OR (b = 0 AND c = 0)');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT a = a');
  estimated | actual 
 -----------+--------
-         6 |    102
+      1658 |      0
 (1 row)
 
-CREATE STATISTICS mcv_lists_partial_stats (mcv) ON a, b, c
-  FROM mcv_lists_partial;
-ANALYZE mcv_lists_partial;
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 AND b = 0 AND c = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a > a');
  estimated | actual 
 -----------+--------
-       102 |    102
+      2778 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a < a');
  estimated | actual 
 -----------+--------
-        96 |    102
+      2778 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 10 AND b = 10 AND c = 10');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a = a');
  estimated | actual 
 -----------+--------
-         2 |      2
+      1683 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 10 OR b = 10 OR c = 10');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT a = a');
  estimated | actual 
 -----------+--------
-         2 |      2
+      4983 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 AND b = 0 AND c = 10');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
  estimated | actual 
 -----------+--------
-         1 |      0
+      1667 |   2500
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 10');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a > a');
  estimated | actual 
 -----------+--------
-       102 |    104
+       556 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0 AND c = 0) OR (a = 1 AND b = 1 AND c = 1) OR (a = 2 AND b = 2 AND c = 2)');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a < a');
  estimated | actual 
 -----------+--------
-       306 |    306
+       556 |      0
 (1 row)
 
-SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0) OR (a = 0 AND c = 0) OR (b = 0 AND c = 0)');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a = a');
  estimated | actual 
 -----------+--------
-       108 |    102
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT a = a');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a > a');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a < a');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a = a');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT a = a');
+ estimated | actual 
+-----------+--------
+      4983 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a > a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a < a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a = a');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT a = a');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a > a');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a < a');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a = a');
+ estimated | actual 
+-----------+--------
+      1683 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT a = a');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
+ estimated | actual 
+-----------+--------
+      1667 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a > a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a < a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a = a');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT a = a');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a > a');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a < a');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a = a');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT a = a');
+ estimated | actual 
+-----------+--------
+      4983 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c > c');
+ estimated | actual 
+-----------+--------
+         8 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c < c');
+ estimated | actual 
+-----------+--------
+         8 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c = c');
+ estimated | actual 
+-----------+--------
+        12 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+        25 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c > c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c < c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      4975 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
+ estimated | actual 
+-----------+--------
+      4975 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c > c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c < c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c = c');
+ estimated | actual 
+-----------+--------
+      2488 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+      4950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c > c');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c < c');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c = c');
+ estimated | actual 
+-----------+--------
+      4975 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      5000 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c > c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c < c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c = c');
+ estimated | actual 
+-----------+--------
+       833 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c > c');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c < c');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c = c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
+ estimated | actual 
+-----------+--------
+      1667 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c > c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c < c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c = c');
+ estimated | actual 
+-----------+--------
+       833 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c > c');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c < c');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c = c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      4983 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c > c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c < c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c = c');
+ estimated | actual 
+-----------+--------
+       833 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c > c');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c < c');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c = c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
+ estimated | actual 
+-----------+--------
+      1667 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c > c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c < c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c = c');
+ estimated | actual 
+-----------+--------
+       833 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c > c');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c < c');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c = c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      4983 |   2500
+(1 row)
+
+-- create statistics
+CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
+ANALYZE mcv_lists;
+-- test pg_mcv_list_items with MCV list containing variable-length data and NULLs
+SELECT m.*
+  FROM pg_statistic_ext s, pg_statistic_ext_data d,
+       pg_mcv_list_items(d.stxdmcv) m
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
+ index |      values      |  nulls  | frequency | base_frequency 
+-------+------------------+---------+-----------+----------------
+     0 | {NULL,x,x}       | {t,f,f} |       0.5 |           0.25
+     1 | {NULL,NULL,NULL} | {t,t,t} |       0.5 |           0.25
+(2 rows)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = ''x'' OR d = ''x''');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''x'' OR d = ''x''');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
+ estimated | actual 
+-----------+--------
+      2500 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+ estimated | actual 
+-----------+--------
+        25 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a > a');
+ estimated | actual 
+-----------+--------
+         8 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a < a');
+ estimated | actual 
+-----------+--------
+         8 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a = a');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT a = a');
+ estimated | actual 
+-----------+--------
+        25 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a > a');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a < a');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a = a');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT a = a');
+ estimated | actual 
+-----------+--------
+      4975 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
+ estimated | actual 
+-----------+--------
+      4975 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a > a');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a < a');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a = a');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT a = a');
+ estimated | actual 
+-----------+--------
+      4950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a > a');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a < a');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a = a');
+ estimated | actual 
+-----------+--------
+      4975 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT a = a');
+ estimated | actual 
+-----------+--------
+      5000 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a > a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a < a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a = a');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT a = a');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a > a');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a < a');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a = a');
+ estimated | actual 
+-----------+--------
+      1683 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT a = a');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
+ estimated | actual 
+-----------+--------
+      1667 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a > a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a < a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a = a');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT a = a');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a > a');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a < a');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a = a');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT a = a');
+ estimated | actual 
+-----------+--------
+      4983 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a > a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a < a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a = a');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT a = a');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a > a');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a < a');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a = a');
+ estimated | actual 
+-----------+--------
+      1683 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT a = a');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
+ estimated | actual 
+-----------+--------
+      1667 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a > a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a < a');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a = a');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT a = a');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a > a');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a < a');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a = a');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT a = a');
+ estimated | actual 
+-----------+--------
+      4983 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c > c');
+ estimated | actual 
+-----------+--------
+         8 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c < c');
+ estimated | actual 
+-----------+--------
+         8 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c = c');
+ estimated | actual 
+-----------+--------
+        12 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+        25 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c > c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c < c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      4975 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
+ estimated | actual 
+-----------+--------
+      4975 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c > c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c < c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c = c');
+ estimated | actual 
+-----------+--------
+      2488 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+      4950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c > c');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c < c');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c = c');
+ estimated | actual 
+-----------+--------
+      4975 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      5000 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c > c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c < c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c = c');
+ estimated | actual 
+-----------+--------
+       833 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c > c');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c < c');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c = c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
+ estimated | actual 
+-----------+--------
+      1667 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c > c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c < c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c = c');
+ estimated | actual 
+-----------+--------
+       833 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c > c');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c < c');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c = c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      4983 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c > c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c < c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c = c');
+ estimated | actual 
+-----------+--------
+       833 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c > c');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c < c');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c = c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      4983 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
+ estimated | actual 
+-----------+--------
+      1667 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c > c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c < c');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c = c');
+ estimated | actual 
+-----------+--------
+       833 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT c = c');
+ estimated | actual 
+-----------+--------
+      1658 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c > c');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c < c');
+ estimated | actual 
+-----------+--------
+      2778 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c = c');
+ estimated | actual 
+-----------+--------
+      1683 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT c = c');
+ estimated | actual 
+-----------+--------
+      4983 |   2500
+(1 row)
+
+-- mcv with pass-by-ref fixlen types, e.g. uuid
+CREATE TABLE mcv_lists_uuid (
+    a UUID,
+    b UUID,
+    c UUID
+)
+WITH (autovacuum_enabled = off);
+INSERT INTO mcv_lists_uuid (a, b, c)
+     SELECT
+         md5(mod(i,100)::text)::uuid,
+         md5(mod(i,50)::text)::uuid,
+         md5(mod(i,25)::text)::uuid
+     FROM generate_series(1,5000) s(i);
+ANALYZE mcv_lists_uuid;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_uuid WHERE a = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND b = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc''');
+ estimated | actual 
+-----------+--------
+         1 |     50
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_uuid WHERE a = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND b = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND c = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc''');
+ estimated | actual 
+-----------+--------
+         1 |     50
+(1 row)
+
+CREATE STATISTICS mcv_lists_uuid_stats (mcv) ON a, b, c
+  FROM mcv_lists_uuid;
+ANALYZE mcv_lists_uuid;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_uuid WHERE a = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND b = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc''');
+ estimated | actual 
+-----------+--------
+        50 |     50
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_uuid WHERE a = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND b = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc'' AND c = ''1679091c-5a88-0faf-6fb5-e6087eb1b2dc''');
+ estimated | actual 
+-----------+--------
+        50 |     50
+(1 row)
+
+DROP TABLE mcv_lists_uuid;
+-- mcv with arrays
+CREATE TABLE mcv_lists_arrays (
+    a TEXT[],
+    b NUMERIC[],
+    c INT[]
+)
+WITH (autovacuum_enabled = off);
+INSERT INTO mcv_lists_arrays (a, b, c)
+     SELECT
+         ARRAY[md5((i/100)::text), md5((i/100-1)::text), md5((i/100+1)::text)],
+         ARRAY[(i/100-1)::numeric/1000, (i/100)::numeric/1000, (i/100+1)::numeric/1000],
+         ARRAY[(i/100-1), i/100, (i/100+1)]
+     FROM generate_series(1,5000) s(i);
+CREATE STATISTICS mcv_lists_arrays_stats (mcv) ON a, b, c
+  FROM mcv_lists_arrays;
+ANALYZE mcv_lists_arrays;
+-- mcv with bool
+CREATE TABLE mcv_lists_bool (
+    a BOOL,
+    b BOOL,
+    c BOOL
+)
+WITH (autovacuum_enabled = off);
+INSERT INTO mcv_lists_bool (a, b, c)
+     SELECT
+         (mod(i,2) = 0), (mod(i,4) = 0), (mod(i,8) = 0)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_lists_bool;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE a AND b AND c');
+ estimated | actual 
+-----------+--------
+       156 |   1250
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND c');
+ estimated | actual 
+-----------+--------
+       156 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND NOT b AND c');
+ estimated | actual 
+-----------+--------
+       469 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
+ estimated | actual 
+-----------+--------
+      1094 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a');
+ estimated | actual 
+-----------+--------
+      9950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a');
+ estimated | actual 
+-----------+--------
+     10000 |  10000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a');
+ estimated | actual 
+-----------+--------
+      3333 |  10000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a');
+ estimated | actual 
+-----------+--------
+      3333 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a');
+ estimated | actual 
+-----------+--------
+      3333 |  10000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a');
+ estimated | actual 
+-----------+--------
+      3333 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      9950 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != b');
+ estimated | actual 
+-----------+--------
+        50 |   7500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < b');
+ estimated | actual 
+-----------+--------
+      3333 |  10000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= b');
+ estimated | actual 
+-----------+--------
+      3333 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > b');
+ estimated | actual 
+-----------+--------
+      3333 |   7500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= b');
+ estimated | actual 
+-----------+--------
+      3333 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a AND b = b');
+ estimated | actual 
+-----------+--------
+      9950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a AND b != b');
+ estimated | actual 
+-----------+--------
+      9950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a AND b < b');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= b');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > b');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= b');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != b AND b != c');
+ estimated | actual 
+-----------+--------
+        50 |   1250
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < b AND b < c');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= b AND b <= c');
+ estimated | actual 
+-----------+--------
+      1111 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > b AND b > c');
+ estimated | actual 
+-----------+--------
+      1111 |   1250
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= b AND b >= c');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = c AND b = b');
+ estimated | actual 
+-----------+--------
+      9950 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != c AND b != b');
+ estimated | actual 
+-----------+--------
+        50 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < c AND b < b');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= c');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > c');
+ estimated | actual 
+-----------+--------
+      1111 |   1250
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= c');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
+  FROM mcv_lists_bool;
+ANALYZE mcv_lists_bool;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE a AND b AND c');
+ estimated | actual 
+-----------+--------
+      1250 |   1250
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND c');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND NOT b AND c');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a');
+ estimated | actual 
+-----------+--------
+      9950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a');
+ estimated | actual 
+-----------+--------
+     10000 |  10000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a');
+ estimated | actual 
+-----------+--------
+      3333 |  10000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a');
+ estimated | actual 
+-----------+--------
+      3333 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a');
+ estimated | actual 
+-----------+--------
+      3333 |  10000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a');
+ estimated | actual 
+-----------+--------
+      3333 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      9950 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != b');
+ estimated | actual 
+-----------+--------
+        50 |   7500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < b');
+ estimated | actual 
+-----------+--------
+      3333 |  10000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= b');
+ estimated | actual 
+-----------+--------
+      3333 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > b');
+ estimated | actual 
+-----------+--------
+      3333 |   7500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= b');
+ estimated | actual 
+-----------+--------
+      3333 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a AND b = b');
+ estimated | actual 
+-----------+--------
+      9950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a AND b != b');
+ estimated | actual 
+-----------+--------
+      9950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a AND b < b');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= b');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > b');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= b');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != b AND b != c');
+ estimated | actual 
+-----------+--------
+        50 |   1250
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < b AND b < c');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= b AND b <= c');
+ estimated | actual 
+-----------+--------
+      1111 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > b AND b > c');
+ estimated | actual 
+-----------+--------
+      1111 |   1250
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= b AND b >= c');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = c AND b = b');
+ estimated | actual 
+-----------+--------
+      9950 |   3750
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != c AND b != b');
+ estimated | actual 
+-----------+--------
+        50 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < c AND b < b');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= c');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > c');
+ estimated | actual 
+-----------+--------
+      1111 |   1250
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= c');
+ estimated | actual 
+-----------+--------
+      1111 |      0
+(1 row)
+
+-- mcv covering just a small fraction of data
+CREATE TABLE mcv_lists_partial (
+    a INT,
+    b INT,
+    c INT
+);
+-- 10 frequent groups, each with 100 elements
+INSERT INTO mcv_lists_partial (a, b, c)
+     SELECT
+         mod(i,10),
+         mod(i,10),
+         mod(i,10)
+     FROM generate_series(0,999) s(i);
+-- 100 groups that will make it to the MCV list (includes the 10 frequent ones)
+INSERT INTO mcv_lists_partial (a, b, c)
+     SELECT
+         i,
+         i,
+         i
+     FROM generate_series(0,99) s(i);
+-- 4000 groups in total, most of which won't make it (just a single item)
+INSERT INTO mcv_lists_partial (a, b, c)
+     SELECT
+         i,
+         i,
+         i
+     FROM generate_series(0,3999) s(i);
+ANALYZE mcv_lists_partial;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 AND b = 0 AND c = 0');
+ estimated | actual 
+-----------+--------
+         1 |    102
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 0');
+ estimated | actual 
+-----------+--------
+       300 |    102
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 10 AND b = 10 AND c = 10');
+ estimated | actual 
+-----------+--------
+         1 |      2
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 10 OR b = 10 OR c = 10');
+ estimated | actual 
+-----------+--------
+         6 |      2
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 AND b = 0 AND c = 10');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 10');
+ estimated | actual 
+-----------+--------
+       204 |    104
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0 AND c = 0) OR (a = 1 AND b = 1 AND c = 1) OR (a = 2 AND b = 2 AND c = 2)');
+ estimated | actual 
+-----------+--------
+         1 |    306
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0) OR (a = 0 AND c = 0) OR (b = 0 AND c = 0)');
+ estimated | actual 
+-----------+--------
+         6 |    102
+(1 row)
+
+CREATE STATISTICS mcv_lists_partial_stats (mcv) ON a, b, c
+  FROM mcv_lists_partial;
+ANALYZE mcv_lists_partial;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 AND b = 0 AND c = 0');
+ estimated | actual 
+-----------+--------
+       102 |    102
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 0');
+ estimated | actual 
+-----------+--------
+        96 |    102
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 10 AND b = 10 AND c = 10');
+ estimated | actual 
+-----------+--------
+         2 |      2
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 10 OR b = 10 OR c = 10');
+ estimated | actual 
+-----------+--------
+         2 |      2
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 AND b = 0 AND c = 10');
+ estimated | actual 
+-----------+--------
+         1 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 10');
+ estimated | actual 
+-----------+--------
+       102 |    104
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0 AND c = 0) OR (a = 1 AND b = 1 AND c = 1) OR (a = 2 AND b = 2 AND c = 2)');
+ estimated | actual 
+-----------+--------
+       306 |    306
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0) OR (a = 0 AND c = 0) OR (b = 0 AND c = 0)');
+ estimated | actual 
+-----------+--------
+       108 |    102
+(1 row)
+
+DROP TABLE mcv_lists_partial;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+ estimated | actual 
+-----------+--------
+      9950 |   2500
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+ estimated | actual 
+-----------+--------
+        50 |   2500
 (1 row)
 
-DROP TABLE mcv_lists_partial;
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -2885,6 +4685,126 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       2649 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+         1 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != d');
+ estimated | actual 
+-----------+--------
+      4950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b AND c > d');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND c >= d');
+ estimated | actual 
+-----------+--------
+       556 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b OR c = d');
+ estimated | actual 
+-----------+--------
+        50 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != d');
+ estimated | actual 
+-----------+--------
+      5000 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b OR c > d');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR c >= d');
+ estimated | actual 
+-----------+--------
+      2778 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a AND c = d');
+ estimated | actual 
+-----------+--------
+        25 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != c');
+ estimated | actual 
+-----------+--------
+      4950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a AND c > d');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND d >= c');
+ estimated | actual 
+-----------+--------
+       556 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a OR c = d');
+ estimated | actual 
+-----------+--------
+        50 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != c');
+ estimated | actual 
+-----------+--------
+      5000 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a OR c > d');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR d >= c');
+ estimated | actual 
+-----------+--------
+      2778 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a > b');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a >= b');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
 CREATE STATISTICS mcv_lists_multi_2 (mcv) ON c, d FROM mcv_lists_multi;
@@ -2931,6 +4851,126 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
       1571 |   1572
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+ estimated | actual 
+-----------+--------
+         1 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != d');
+ estimated | actual 
+-----------+--------
+      4950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b AND c > d');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND c >= d');
+ estimated | actual 
+-----------+--------
+       556 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b OR c = d');
+ estimated | actual 
+-----------+--------
+        50 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != d');
+ estimated | actual 
+-----------+--------
+      5000 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b OR c > d');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR c >= d');
+ estimated | actual 
+-----------+--------
+      2778 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a AND c = d');
+ estimated | actual 
+-----------+--------
+        25 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != c');
+ estimated | actual 
+-----------+--------
+      4950 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a AND c > d');
+ estimated | actual 
+-----------+--------
+       556 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND d >= c');
+ estimated | actual 
+-----------+--------
+       556 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a OR c = d');
+ estimated | actual 
+-----------+--------
+        50 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != c');
+ estimated | actual 
+-----------+--------
+      5000 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a OR c > d');
+ estimated | actual 
+-----------+--------
+      2778 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR d >= c');
+ estimated | actual 
+-----------+--------
+      2778 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a > b');
+ estimated | actual 
+-----------+--------
+      1667 |      0
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a >= b');
+ estimated | actual 
+-----------+--------
+      1667 |   5000
+(1 row)
+
 DROP TABLE mcv_lists_multi;
 -- statistics on integer expressions
 CREATE TABLE expr_stats (a int, b int, c int);
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 6b954c9e500..1b4d57e0462 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -989,6 +989,30 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a != a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a != c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -1042,6 +1066,30 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a != a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a != c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= c');
+
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52, NULL) AND b IN ( ''1'', ''2'', NULL)');
@@ -1241,6 +1289,221 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT c = c');
+
+
 -- create statistics
 CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, d FROM mcv_lists;
 
@@ -1259,6 +1522,219 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = '
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND (b = ''x'' OR d = ''x'')');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT a = a');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT c = c');
+
 -- mcv with pass-by-ref fixlen types, e.g. uuid
 CREATE TABLE mcv_lists_uuid (
     a UUID,
@@ -1334,6 +1810,66 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a AND b = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a AND b != b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a AND b < b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != b AND b != c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < b AND b < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= b AND b <= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > b AND b > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= b AND b >= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = c AND b = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != c AND b != b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < c AND b < b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= c');
+
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
   FROM mcv_lists_bool;
 
@@ -1347,6 +1883,66 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a AND b = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a AND b != b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a AND b < b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != b AND b != c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < b AND b < c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= b AND b <= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > b AND b > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= b AND b >= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = c AND b = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != c AND b != b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < c AND b < b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= c');
+
 -- mcv covering just a small fraction of data
 CREATE TABLE mcv_lists_partial (
     a INT,
@@ -1419,6 +2015,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 
 DROP TABLE mcv_lists_partial;
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
+
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
 	a INTEGER,
@@ -1447,6 +2047,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AN
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b AND c > d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND c >= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b OR c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b OR c > d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR c >= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a AND c > d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND d >= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a OR c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a OR c > d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR d >= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a > b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a >= b');
+
 -- create separate MCV statistics
 CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
 CREATE STATISTICS mcv_lists_multi_2 (mcv) ON c, d FROM mcv_lists_multi;
@@ -1461,6 +2086,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 AN
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 AND b = 0) OR (c = 0 AND d = 0)');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b AND c > d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND c >= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b OR c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b OR c > d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR c >= d');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a AND c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a AND c > d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND d >= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a OR c = d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != c');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a OR c > d');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR d >= c');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a > b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a >= b');
+
 DROP TABLE mcv_lists_multi;
 
 
-- 
2.34.3

0002-Improve-Var-op-Var-estimates-with-the-same--20220721.patchtext/x-patch; charset=UTF-8; name=0002-Improve-Var-op-Var-estimates-with-the-same--20220721.patchDownload
From 5c73433f80cace5aefbac5f913e9e69379220632 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 26 Aug 2021 23:01:04 +0200
Subject: [PATCH 2/4] Improve Var op Var estimates with the same Var

When estimating (Var op Var) clauses, we can treat the case with the
same Var on both sides as a special case, and we can provide better
selectivity estimate than for the generic case.

For example for (a = a) clause we know it's (1.0 - nullfrac), because
all non-NULL rows are expected to match. Similarly for (a != a), wich
has selectivity 0.0. And the same reasoning can be applied to inequality
comparisons, like (a < a) etc.

This of course depends on the operators having the usual meaning (e.g.
"=" equality) and not some custom operator with different behavior, but
if they use the same estimator it's reasonable to assume that (we
already do that in various other places).

In principle, such clauses are a bit strange and queries are unlikely to
use them very often. But query generators sometimes do this, and these
checks are quite cheap so it's likely a win.
---
 src/backend/utils/adt/selfuncs.c        | 103 +++++-
 src/test/regress/expected/stats_ext.out | 436 ++++++++++++------------
 2 files changed, 320 insertions(+), 219 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d35e5605de8..9289e9b8da8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -210,7 +210,8 @@ static bool get_actual_variable_endpoint(Relation heapRel,
 										 MemoryContext outercontext,
 										 Datum *endpointDatum);
 static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
-
+static bool matching_restriction_variables(PlannerInfo *root, List *args,
+										   int varRelid);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -256,6 +257,31 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
 		}
 	}
 
+	/*
+	 * If it's (variable = variable) with the same variable on both sides, it's
+	 * a special case and we know it's not expected to filter anything, so we
+	 * estimate the selectivity as (1.0 - nullfrac), or 0.0 when it's negated.
+	 */
+	if (matching_restriction_variables(root, args, varRelid))
+	{
+		VariableStatData	vardata;
+		double				null_frac = 0.0;
+
+		examine_variable(root, linitial(args), varRelid, &vardata);
+
+		if (HeapTupleIsValid(vardata.statsTuple))
+		{
+			Form_pg_statistic	stats;
+
+			stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+			null_frac = stats->stanullfrac;
+		}
+
+		ReleaseVariableStats(vardata);
+
+		return (negate) ? 0.0 : (1.0 - null_frac);
+	}
+
 	/*
 	 * If expression is not variable = something or something = variable, then
 	 * punt and return a default estimate.
@@ -1408,6 +1434,22 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
 	Oid			consttype;
 	double		selec;
 
+	/*
+	 * Handle (variable < variable) and (variable <= variable) with the same
+	 * variable on both sides as a special case. The strict inequality should
+	 * not match any rows, hence selectivity is 0.0. The other case is about
+	 * the same as equality, so selectivity is 1.0.
+	 */
+	if (matching_restriction_variables(root, args, varRelid))
+	{
+		/* The case with equality matches all rows, so estimate it as 1.0. */
+		if (iseq)
+			PG_RETURN_FLOAT8(1.0);
+
+		/* Strict inequality matches nothing, so selectivity is 0.0. */
+		PG_RETURN_FLOAT8(0.0);
+	}
+
 	/*
 	 * If expression is not variable op something or something op variable,
 	 * then punt and return a default estimate.
@@ -4906,6 +4948,65 @@ get_restriction_variable(PlannerInfo *root, List *args, int varRelid,
 	return false;
 }
 
+/*
+ * matching_restriction_variable
+ *		Check if the two arguments of a restriction clause refer to the same
+ *		variable, i.e. if the condition is of the form (variable op variable).
+ *		We can deduce selectivity for such (in)equality clauses.
+ *
+ * Inputs:
+ *	root: the planner info
+ *	args: clause argument list
+ *	varRelid: see specs for restriction selectivity functions
+ *
+ * Returns true if the same variable is on both sides, otherwise false.
+ */
+static bool
+matching_restriction_variables(PlannerInfo *root, List *args, int varRelid)
+{
+	Node	   *left,
+			   *right;
+	Bitmapset  *varnos;
+
+	/* Fail if not a binary opclause (probably shouldn't happen) */
+	if (list_length(args) != 2)
+		return false;
+
+	left = (Node *) linitial(args);
+	right = (Node *) lsecond(args);
+
+	/* Look inside any binary-compatible relabeling */
+
+	if (IsA(left, RelabelType))
+		left = (Node *) ((RelabelType *) left)->arg;
+
+	if (IsA(right, RelabelType))
+		right = (Node *) ((RelabelType *) right)->arg;
+
+	/*
+	 * Check if it's safe to compare the two expressions. The expressions
+	 * must not contain any volatile expressions, and must belong to the
+	 * same relation.
+	 *
+	 * XXX We do a small trick - we validate just one expression, and then
+	 * check it's equal to the other side.
+	 */
+	if (contain_volatile_functions(left))
+		return false;
+
+	/* Check it belongs to a single relation. */
+	varnos = pull_varnos(root, left);
+
+	if (bms_num_members(varnos) != 1)
+		return false;
+
+	if ((varRelid != 0) && (!bms_is_member(varRelid, varnos)))
+		return false;
+
+	/* The two variables need to match */
+	return equal(left, right);
+}
+
 /*
  * get_join_variables
  *		Apply examine_variable() to each side of a join clause.
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 009c7cfd848..fea3d18442c 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1997,31 +1997,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = a');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a != a');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= a');
  estimated | actual 
 -----------+--------
-      1667 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= a');
  estimated | actual 
 -----------+--------
-      1667 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = c');
@@ -2216,31 +2216,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = a');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a != a');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= a');
  estimated | actual 
 -----------+--------
-      1667 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= a');
  estimated | actual 
 -----------+--------
-      1667 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = c');
@@ -2697,13 +2697,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a > a');
  estimated | actual 
 -----------+--------
-         8 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a < a');
  estimated | actual 
 -----------+--------
-         8 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a = a');
@@ -2715,31 +2715,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT a = a');
  estimated | actual 
 -----------+--------
-        25 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a > a');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a < a');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a = a');
  estimated | actual 
 -----------+--------
-        50 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      4975 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
@@ -2751,13 +2751,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a > a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a < a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a = a');
@@ -2769,19 +2769,19 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT a = a');
  estimated | actual 
 -----------+--------
-      4950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a > a');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a < a');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a = a');
@@ -2793,7 +2793,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      5000 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
@@ -2805,13 +2805,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a > a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a < a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a = a');
@@ -2823,31 +2823,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT a = a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a > a');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a < a');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a = a');
  estimated | actual 
 -----------+--------
-      1683 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
@@ -2859,13 +2859,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a > a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a < a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a = a');
@@ -2877,31 +2877,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT a = a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a > a');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a < a');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a = a');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      4983 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
@@ -2913,13 +2913,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a > a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a < a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a = a');
@@ -2931,31 +2931,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT a = a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a > a');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a < a');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a = a');
  estimated | actual 
 -----------+--------
-      1683 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
@@ -2967,13 +2967,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a > a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a < a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a = a');
@@ -2985,43 +2985,43 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT a = a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a > a');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a < a');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a = a');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      4983 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c > c');
  estimated | actual 
 -----------+--------
-         8 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c < c');
  estimated | actual 
 -----------+--------
-         8 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c = c');
@@ -3033,31 +3033,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT c = c');
  estimated | actual 
 -----------+--------
-        25 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c > c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c < c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c = c');
  estimated | actual 
 -----------+--------
-        50 |   2500
+      2513 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      4975 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
@@ -3069,13 +3069,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c > c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c < c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c = c');
@@ -3087,31 +3087,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT c = c');
  estimated | actual 
 -----------+--------
-      4950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c > c');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c < c');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c = c');
  estimated | actual 
 -----------+--------
-      4975 |   2500
+      4988 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      5000 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
@@ -3123,13 +3123,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c > c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c < c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c = c');
@@ -3141,31 +3141,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT c = c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c > c');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c < c');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c = c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      3333 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
@@ -3177,13 +3177,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c > c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c < c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c = c');
@@ -3195,31 +3195,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT c = c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c > c');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c < c');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c = c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      3333 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      4983 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
@@ -3231,13 +3231,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c > c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c < c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c = c');
@@ -3249,31 +3249,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT c = c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c > c');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c < c');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c = c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      3333 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
@@ -3285,13 +3285,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c > c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c < c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c = c');
@@ -3303,31 +3303,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT c = c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c > c');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c < c');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c = c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      3333 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      4983 |   2500
+      1667 |   2500
 (1 row)
 
 -- create statistics
@@ -3372,13 +3372,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a > a');
  estimated | actual 
 -----------+--------
-         8 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a < a');
  estimated | actual 
 -----------+--------
-         8 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a = a');
@@ -3390,31 +3390,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT a = a');
  estimated | actual 
 -----------+--------
-        25 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a > a');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a < a');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a = a');
  estimated | actual 
 -----------+--------
-        50 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      4975 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
@@ -3426,13 +3426,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a > a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a < a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a = a');
@@ -3444,19 +3444,19 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT a = a');
  estimated | actual 
 -----------+--------
-      4950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a > a');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a < a');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a = a');
@@ -3468,7 +3468,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      5000 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
@@ -3480,13 +3480,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a > a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a < a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a = a');
@@ -3498,31 +3498,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT a = a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a > a');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a < a');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a = a');
  estimated | actual 
 -----------+--------
-      1683 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
@@ -3534,13 +3534,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a > a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a < a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a = a');
@@ -3552,31 +3552,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT a = a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a > a');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a < a');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a = a');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      4983 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
@@ -3588,13 +3588,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a > a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a < a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a = a');
@@ -3606,31 +3606,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT a = a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a > a');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a < a');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a = a');
  estimated | actual 
 -----------+--------
-      1683 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
@@ -3642,13 +3642,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a > a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a < a');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a = a');
@@ -3660,43 +3660,43 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT a = a');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a > a');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a < a');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a = a');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      4983 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c > c');
  estimated | actual 
 -----------+--------
-         8 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c < c');
  estimated | actual 
 -----------+--------
-         8 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c = c');
@@ -3708,31 +3708,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT c = c');
  estimated | actual 
 -----------+--------
-        25 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c > c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c < c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c = c');
  estimated | actual 
 -----------+--------
-        50 |   2500
+      2513 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      4975 |   2500
+        25 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
@@ -3744,13 +3744,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c > c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c < c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c = c');
@@ -3762,31 +3762,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT c = c');
  estimated | actual 
 -----------+--------
-      4950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c > c');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c < c');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c = c');
  estimated | actual 
 -----------+--------
-      4975 |   2500
+      4988 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      5000 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
@@ -3798,13 +3798,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c > c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c < c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c = c');
@@ -3816,31 +3816,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT c = c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c > c');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c < c');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c = c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      3333 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
@@ -3852,13 +3852,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c > c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c < c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c = c');
@@ -3870,31 +3870,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT c = c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c > c');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c < c');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c = c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      3333 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      4983 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
@@ -3906,13 +3906,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c > c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c < c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c = c');
@@ -3924,31 +3924,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT c = c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c > c');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c < c');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c = c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      3333 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      4983 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
@@ -3960,13 +3960,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c > c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c < c');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c = c');
@@ -3978,31 +3978,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c =
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT c = c');
  estimated | actual 
 -----------+--------
-      1658 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c > c');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c < c');
  estimated | actual 
 -----------+--------
-      2778 |   2500
+      1667 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c = c');
  estimated | actual 
 -----------+--------
-      1683 |   2500
+      3333 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      4983 |   2500
+      1667 |   2500
 (1 row)
 
 -- mcv with pass-by-ref fixlen types, e.g. uuid
@@ -4102,7 +4102,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a');
  estimated | actual 
 -----------+--------
-      9950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a');
@@ -4114,25 +4114,25 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a !=
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a');
  estimated | actual 
 -----------+--------
-      3333 |  10000
+     10000 |  10000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a');
  estimated | actual 
 -----------+--------
-      3333 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a');
  estimated | actual 
 -----------+--------
-      3333 |  10000
+     10000 |  10000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a');
  estimated | actual 
 -----------+--------
-      3333 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
@@ -4174,37 +4174,37 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >=
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a AND b = b');
  estimated | actual 
 -----------+--------
-      9950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a AND b != b');
  estimated | actual 
 -----------+--------
-      9950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a AND b < b');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= b');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > b');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= b');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
@@ -4252,31 +4252,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = c
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != c AND b != b');
  estimated | actual 
 -----------+--------
-        50 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < c AND b < b');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= c');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > c');
  estimated | actual 
 -----------+--------
-      1111 |   1250
+      3333 |   1250
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= c');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 CREATE STATISTICS mcv_lists_bool_stats (mcv) ON a, b, c
@@ -4309,7 +4309,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a');
  estimated | actual 
 -----------+--------
-      9950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a');
@@ -4321,25 +4321,25 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a !=
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a');
  estimated | actual 
 -----------+--------
-      3333 |  10000
+     10000 |  10000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a');
  estimated | actual 
 -----------+--------
-      3333 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a');
  estimated | actual 
 -----------+--------
-      3333 |  10000
+     10000 |  10000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a');
  estimated | actual 
 -----------+--------
-      3333 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
@@ -4381,37 +4381,37 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >=
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a AND b = b');
  estimated | actual 
 -----------+--------
-      9950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != a AND b != b');
  estimated | actual 
 -----------+--------
-      9950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < a AND b < b');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= b');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > b');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= b');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
@@ -4459,31 +4459,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = c
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != c AND b != b');
  estimated | actual 
 -----------+--------
-        50 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < c AND b < b');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= a AND b <= c');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > c');
  estimated | actual 
 -----------+--------
-      1111 |   1250
+      3333 |   1250
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= c');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 -- mcv covering just a small fraction of data
@@ -4742,13 +4742,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a AN
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != c');
  estimated | actual 
 -----------+--------
-      4950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a AND c > d');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND d >= c');
@@ -4760,19 +4760,19 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b A
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a OR c = d');
  estimated | actual 
 -----------+--------
-        50 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != c');
  estimated | actual 
 -----------+--------
-      5000 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a OR c > d');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR d >= c');
@@ -4908,13 +4908,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a AN
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != c');
  estimated | actual 
 -----------+--------
-      4950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a AND c > d');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND d >= c');
@@ -4926,19 +4926,19 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b A
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a OR c = d');
  estimated | actual 
 -----------+--------
-        50 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != c');
  estimated | actual 
 -----------+--------
-      5000 |      0
+      4975 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a OR c > d');
  estimated | actual 
 -----------+--------
-      2778 |      0
+      1667 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR d >= c');
-- 
2.34.3

0003-Modify-selectivity-estimation-for-Var-op-Va-20220721.patchtext/x-patch; charset=UTF-8; name=0003-Modify-selectivity-estimation-for-Var-op-Va-20220721.patchDownload
From 0848ae05d4d1e1c2fbefaa5a43a4910efe9bfe98 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 15 Aug 2021 13:23:13 +0200
Subject: [PATCH 3/4] Modify selectivity estimation for Var op Var

This modifies estimation of (Var op Var) clauses with distinct Vars from
the same relation. We recognize the clauses as compatible with extended
statistics, and estimate them using the extended MCV etc.
---
 src/backend/optimizer/path/clausesel.c        |  37 +++-
 src/backend/statistics/extended_stats.c       |  83 +++++--
 src/backend/statistics/mcv.c                  | 172 +++++++++++----
 .../statistics/extended_stats_internal.h      |   4 +-
 src/test/regress/expected/stats_ext.out       | 208 +++++++++---------
 5 files changed, 323 insertions(+), 181 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 06f836308d0..90ee73f89df 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -714,6 +714,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
 	bool		cacheable = false;
+	Node	   *src = clause;
 
 	if (clause == NULL)			/* can this still happen? */
 		return s1;
@@ -871,11 +872,37 @@ clause_selectivity_ext(PlannerInfo *root,
 		}
 		else
 		{
-			/* Estimate selectivity for a restriction clause. */
-			s1 = restriction_selectivity(root, opno,
-										 opclause->args,
-										 opclause->inputcollid,
-										 varRelid);
+			/*
+			 * It might be a single (Expr op Expr) clause, which goes here due
+			 * to the optimization at the beginning of clauselist_selectivity.
+			 * So we try applying extended stats first, and then fall back to
+			 * restriction_selectivity.
+			 */
+			bool	estimated = false;
+
+			if (use_extended_stats)
+			{
+				List	   *clauses = list_make1(src);
+				RelOptInfo *rel = find_single_rel_for_clauses(root, clauses);
+
+				if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+				{
+					Bitmapset  *estimatedclauses = NULL;
+
+					s1 = statext_clauselist_selectivity(root, clauses, varRelid,
+														jointype, sjinfo, rel,
+														&estimatedclauses, false);
+
+					estimated = (bms_num_members(estimatedclauses) == 1);
+				}
+			}
+
+			/* Estimate selectivity for a restriction clause (fallback). */
+			if (!estimated)
+				s1 = restriction_selectivity(root, opno,
+											 opclause->args,
+											 opclause->inputcollid,
+											 varRelid);
 		}
 
 		/*
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index d2aa8d0ca3f..7bfde068aff 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1352,19 +1352,27 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		return true;
 	}
 
-	/* (Var/Expr op Const) or (Const op Var/Expr) */
+	/*
+	 * Three opclause variants are supported: (Expr op Const), (Const op Expr),
+	 * (Expr op Expr). That means we may need to analyze one or two expressions
+	 * to make sure the opclause is compatible with extended stats.
+	 */
 	if (is_opclause(clause))
 	{
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		OpExpr	   *expr = (OpExpr *) clause;
-		Node	   *clause_expr;
+		ListCell   *lc;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
-		/* Check if the expression has the right shape */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		/*
+		 * Check if the expression has the right shape. This returns either one
+		 * or two expressions, depending on whether there is a Const.
+		 */
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
 		/*
@@ -1404,13 +1412,31 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			!get_func_leakproof(get_opcode(expr->opno)))
 			return false;
 
-		/* Check (Var op Const) or (Const op Var) clauses by recursing. */
-		if (IsA(clause_expr, Var))
-			return statext_is_compatible_clause_internal(root, clause_expr,
-														 relid, attnums, exprs);
+		/*
+		 * There's always at least one expression, otherwise the clause would
+		 * not be considered compatible.
+		 */
+		Assert(list_length(clause_exprs) >= 1);
+
+		/*
+		 * Check all expressions by recursing. Var expressions are handled as
+		 * a special case (to match it to attnums etc.)
+		 */
+		foreach (lc, clause_exprs)
+		{
+			Node *clause_expr = (Node *) lfirst(lc);
+
+			if (IsA(clause_expr, Var))
+			{
+				/* if the Var is incompatible, the whole clause is incompatible */
+				if (!statext_is_compatible_clause_internal(root, clause_expr,
+														   relid, attnums, exprs))
+					return false;
+			}
+			else	/* generic expression */
+				*exprs = lappend(*exprs, clause_expr);
+		}
 
-		/* Otherwise we have (Expr op Const) or (Const op Expr). */
-		*exprs = lappend(*exprs, clause_expr);
 		return true;
 	}
 
@@ -1420,15 +1446,21 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) clause;
 		Node	   *clause_expr;
+		List	   *clause_exprs;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
 			return false;
 
+		/* There has to be one expression exactly. */
+		Assert(list_length(clause_exprs) == 1);
+
+		clause_expr = (Node *) linitial(clause_exprs);
+
 		/*
 		 * If it's not one of the supported operators ("=", "<", ">", etc.),
 		 * just ignore the clause, as it's not compatible with MCV lists.
@@ -2015,20 +2047,19 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
  * examine_opclause_args
  *		Split an operator expression's arguments into Expr and Const parts.
  *
- * Attempts to match the arguments to either (Expr op Const) or (Const op
- * Expr), possibly with a RelabelType on top. When the expression matches this
- * form, returns true, otherwise returns false.
+ * Attempts to match the arguments to either (Expr op Const) or (Const op Expr)
+ * or (Expr op Expr), possibly with a RelabelType on top. When the expression
+ * matches this form, returns true, otherwise returns false.
  *
  * Optionally returns pointers to the extracted Expr/Const nodes, when passed
- * non-null pointers (exprp, cstp and expronleftp). The expronleftp flag
+ * non-null pointers (exprsp, cstp and expronleftp). The expronleftp flag
  * specifies on which side of the operator we found the expression node.
  */
 bool
-examine_opclause_args(List *args, Node **exprp, Const **cstp,
-					  bool *expronleftp)
+examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp)
 {
-	Node	   *expr;
-	Const	   *cst;
+	List	   *exprs = NIL;
+	Const	   *cst = NULL;
 	bool		expronleft;
 	Node	   *leftop,
 			   *rightop;
@@ -2048,22 +2079,26 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 
 	if (IsA(rightop, Const))
 	{
-		expr = (Node *) leftop;
+		exprs = lappend(exprs, leftop);
 		cst = (Const *) rightop;
 		expronleft = true;
 	}
 	else if (IsA(leftop, Const))
 	{
-		expr = (Node *) rightop;
+		exprs = lappend(exprs, rightop);
 		cst = (Const *) leftop;
 		expronleft = false;
 	}
 	else
-		return false;
+	{
+		exprs = lappend(exprs, leftop);
+		exprs = lappend(exprs, rightop);
+		expronleft = false;
+	}
 
 	/* return pointers to the extracted parts if requested */
-	if (exprp)
-		*exprp = expr;
+	if (exprsp)
+		*exprsp = exprs;
 
 	if (cstp)
 		*cstp = cst;
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index f10642df4f7..5806cfbb564 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1645,78 +1645,154 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			Node	   *clause_expr2;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
-			int			idx;
 			Oid			collid;
 
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
-			/* match the attribute/expression to a dimension of the statistic */
-			idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+			if (cst)	/* Expr op Const */
+			{
+				int idx;
 
-			Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert(list_length(clause_exprs) == 1);
+				clause_expr = (Node *) linitial(clause_exprs);
 
-			/*
-			 * Walk through the MCV items and evaluate the current clause. We
-			 * can skip items that were already ruled out, and terminate if
-			 * there are no remaining MCV items that might possibly match.
-			 */
-			for (i = 0; i < mcvlist->nitems; i++)
-			{
-				bool		match = true;
-				MCVItem    *item = &mcvlist->items[i];
+				/* match the attribute/expression to a dimension of the statistic */
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
 
-				Assert(idx >= 0);
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * When the MCV item or the Const value is NULL we can treat
-				 * this as a mismatch. We must not call the operator because
-				 * of strictness.
+				 * Walk through the MCV items and evaluate the current clause. We
+				 * can skip items that were already ruled out, and terminate if
+				 * there are no remaining MCV items that might possibly match.
 				 */
-				if (item->isnull[idx] || cst->constisnull)
+				for (i = 0; i < mcvlist->nitems; i++)
 				{
-					matches[i] = RESULT_MERGE(matches[i], is_or, false);
-					continue;
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					Assert(idx >= 0);
+
+					/*
+					 * When the MCV item or the Const value is NULL we can treat
+					 * this as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || cst->constisnull)
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap. Once
+					 * the value gets false for AND-lists, or true for OR-lists,
+					 * we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics, but
+					 * we can use the collation for the attribute itself, as
+					 * stored in varcollid. We do reset the statistics after a
+					 * type change (including collation change), so this is OK.
+					 * For expressions, we use the collation extracted from the
+					 * expression itself.
+					 */
+					if (expronleft)
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   item->values[idx],
+															   cst->constvalue));
+					else
+						match = DatumGetBool(FunctionCall2Coll(&opproc,
+															   collid,
+															   cst->constvalue,
+															   item->values[idx]));
+
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
 				}
+			}
+			else	/* Expr op Expr */
+			{
+				int			idx;
+				int			idx2;
+
+				Assert(list_length(clause_exprs) == 2);
+
+				clause_expr = (Node *) linitial(clause_exprs);
+				clause_expr2 = (Node *) lsecond(clause_exprs);
+
+				Assert(clause_expr2);
+				Assert(!expronleft);
 
 				/*
-				 * Skip MCV items that can't change result in the bitmap. Once
-				 * the value gets false for AND-lists, or true for OR-lists,
-				 * we don't need to look at more clauses.
+				 * Match the expressions to a dimension of the statistic.
+				 *
+				 * XXX Can the collations differ?
 				 */
-				if (RESULT_IS_FINAL(matches[i], is_or))
-					continue;
+				idx = mcv_match_expression(clause_expr, keys, exprs, &collid);
+				idx2 = mcv_match_expression(clause_expr2, keys, exprs, &collid);
+
+				Assert((idx >= 0) && (idx < bms_num_members(keys) + list_length(exprs)));
+				Assert((idx2 >= 0) && (idx2 < bms_num_members(keys) + list_length(exprs)));
 
 				/*
-				 * First check whether the constant is below the lower
-				 * boundary (in that case we can skip the bucket, because
-				 * there's no overlap).
-				 *
-				 * We don't store collations used to build the statistics, but
-				 * we can use the collation for the attribute itself, as
-				 * stored in varcollid. We do reset the statistics after a
-				 * type change (including collation change), so this is OK.
-				 * For expressions, we use the collation extracted from the
-				 * expression itself.
+				 * Walk through the MCV items and evaluate the current clause.
+				 * We can skip items that were already ruled out, and
+				 * terminate if there are no remaining MCV items that might
+				 * possibly match.
 				 */
-				if (expronleft)
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool		match = true;
+					MCVItem    *item = &mcvlist->items[i];
+
+					/*
+					 * When either of the MCV items is NULL we can treat this
+					 * as a mismatch. We must not call the operator because
+					 * of strictness.
+					 */
+					if (item->isnull[idx] || item->isnull[idx2])
+					{
+						matches[i] = RESULT_MERGE(matches[i], is_or, false);
+						continue;
+					}
+
+					/*
+					 * Skip MCV items that can't change result in the bitmap.
+					 * Once the value gets false for AND-lists, or true for
+					 * OR-lists, we don't need to look at more clauses.
+					 */
+					if (RESULT_IS_FINAL(matches[i], is_or))
+						continue;
+
+					/*
+					 * We don't store collations used to build the statistics,
+					 * but we can use the collation for the attribute itself,
+					 * as stored in varcollid. We do reset the statistics after
+					 * a type change (including collation change), so this is
+					 * OK. We may need to relax this after allowing extended
+					 * statistics on expressions.
+					 */
 					match = DatumGetBool(FunctionCall2Coll(&opproc,
 														   collid,
 														   item->values[idx],
-														   cst->constvalue));
-				else
-					match = DatumGetBool(FunctionCall2Coll(&opproc,
-														   collid,
-														   cst->constvalue,
-														   item->values[idx]));
+														   item->values[idx2]));
 
-				/* update the match bitmap with the result */
-				matches[i] = RESULT_MERGE(matches[i], is_or, match);
+					/* update the match bitmap with the result */
+					matches[i] = RESULT_MERGE(matches[i], is_or, match);
+				}
 			}
 		}
 		else if (IsA(clause, ScalarArrayOpExpr))
@@ -1726,6 +1802,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
 			/* valid only after examine_opclause_args returns true */
 			Node	   *clause_expr;
+			List	   *clause_exprs;
 			Const	   *cst;
 			bool		expronleft;
 			Oid			collid;
@@ -1743,11 +1820,14 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_expr, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
 				elog(ERROR, "incompatible clause");
 
 			/* ScalarArrayOpExpr has the Var always on the left */
 			Assert(expronleft);
+			Assert(list_length(clause_exprs) == 1);
+
+			clause_expr = (Node *) linitial(clause_exprs);
 
 			/* XXX what if (cst->constisnull == NULL)? */
 			if (!cst->constisnull)
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 71f852c157b..bd4bf7fdf14 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -99,8 +99,8 @@ extern SortItem *build_sorted_items(StatsBuildData *data, int *nitems,
 									MultiSortSupport mss,
 									int numattrs, AttrNumber *attnums);
 
-extern bool examine_opclause_args(List *args, Node **exprp,
-								  Const **cstp, bool *expronleftp);
+extern bool examine_opclause_args(List *args, List **exprs, Const **cstp,
+								  bool *expronleftp);
 
 extern Selectivity mcv_combine_selectivities(Selectivity simple_sel,
 											 Selectivity mcv_sel,
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index fea3d18442c..73ba59d75f1 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -2246,37 +2246,37 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= a');
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = c');
  estimated | actual 
 -----------+--------
-        25 |   1250
+      1250 |   1250
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a != c');
  estimated | actual 
 -----------+--------
-      4975 |   3750
+      3750 |   3750
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a > c');
  estimated | actual 
 -----------+--------
-      1667 |   3750
+      3750 |   3750
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a >= c');
  estimated | actual 
 -----------+--------
-      1667 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < c');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= c');
  estimated | actual 
 -----------+--------
-      1667 |   1250
+      1250 |   1250
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IN (1, 2, 51, 52) AND b IN ( ''1'', ''2'')');
@@ -3366,7 +3366,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d');
  estimated | actual 
 -----------+--------
-        25 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND a > a');
@@ -3396,31 +3396,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a > a');
  estimated | actual 
 -----------+--------
-        25 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a < a');
  estimated | actual 
 -----------+--------
-        25 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR a = a');
  estimated | actual 
 -----------+--------
-        25 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT a = a');
  estimated | actual 
 -----------+--------
-        25 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND a > a');
@@ -3450,31 +3450,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a > a');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a < a');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR a = a');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND a > a');
@@ -3504,31 +3504,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a > a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a < a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR a = a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND a > a');
@@ -3558,31 +3558,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a > a');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a < a');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR a = a');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND a > a');
@@ -3612,31 +3612,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a > a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a < a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR a = a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND a > a');
@@ -3666,25 +3666,25 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a > a');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a < a');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR a = a');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT a = a');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c > c');
@@ -3702,7 +3702,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c <
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND c = c');
  estimated | actual 
 -----------+--------
-        12 |   2500
+      1250 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT c = c');
@@ -3714,31 +3714,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c > c');
  estimated | actual 
 -----------+--------
-        25 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c < c');
  estimated | actual 
 -----------+--------
-        25 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR c = c');
  estimated | actual 
 -----------+--------
-      2513 |   2500
+      3750 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b = d OR NOT c = c');
  estimated | actual 
 -----------+--------
-        25 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c > c');
@@ -3756,7 +3756,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c <
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND c = c');
  estimated | actual 
 -----------+--------
-      2488 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT c = c');
@@ -3768,31 +3768,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c > c');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c < c');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR c = c');
  estimated | actual 
 -----------+--------
-      4988 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b != d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c > c');
@@ -3810,7 +3810,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c <
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND c = c');
  estimated | actual 
 -----------+--------
-       833 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT c = c');
@@ -3822,31 +3822,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c > c');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c < c');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR c = c');
  estimated | actual 
 -----------+--------
-      3333 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b < d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c > c');
@@ -3864,7 +3864,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c <
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND c = c');
  estimated | actual 
 -----------+--------
-       833 |   2500
+      1250 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT c = c');
@@ -3876,31 +3876,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c > c');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c < c');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR c = c');
  estimated | actual 
 -----------+--------
-      3333 |   2500
+      3750 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b <= d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c > c');
@@ -3918,7 +3918,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c <
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND c = c');
  estimated | actual 
 -----------+--------
-       833 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT c = c');
@@ -3930,31 +3930,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c > c');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c < c');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR c = c');
  estimated | actual 
 -----------+--------
-      3333 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b > d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c > c');
@@ -3972,7 +3972,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c <
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND c = c');
  estimated | actual 
 -----------+--------
-       833 |   2500
+      1250 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT c = c');
@@ -3984,25 +3984,25 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d AND NOT
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c > c');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c < c');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR c = c');
  estimated | actual 
 -----------+--------
-      3333 |   2500
+      3750 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE b >= d OR NOT c = c');
  estimated | actual 
 -----------+--------
-      1667 |   2500
+      2500 |   2500
 (1 row)
 
 -- mcv with pass-by-ref fixlen types, e.g. uuid
@@ -4345,37 +4345,37 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >=
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
  estimated | actual 
 -----------+--------
-      9950 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != b');
  estimated | actual 
 -----------+--------
-        50 |   7500
+      7500 |   7500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < b');
  estimated | actual 
 -----------+--------
-      3333 |  10000
+     10000 |  10000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= b');
  estimated | actual 
 -----------+--------
-      3333 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > b');
  estimated | actual 
 -----------+--------
-      3333 |   7500
+      7500 |   7500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= b');
  estimated | actual 
 -----------+--------
-      3333 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = a AND b = b');
@@ -4417,43 +4417,43 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >=
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
  estimated | actual 
 -----------+--------
-        50 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != b AND b != c');
  estimated | actual 
 -----------+--------
-        50 |   1250
+      1250 |   1250
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a < b AND b < c');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <= b AND b <= c');
  estimated | actual 
 -----------+--------
-      1111 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > b AND b > c');
  estimated | actual 
 -----------+--------
-      1111 |   1250
+      1250 |   1250
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= b AND b >= c');
  estimated | actual 
 -----------+--------
-      1111 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = c AND b = b');
  estimated | actual 
 -----------+--------
-      9950 |   3750
+      3750 |   3750
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a != c AND b != b');
@@ -4477,7 +4477,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a <=
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a > a AND b > c');
  estimated | actual 
 -----------+--------
-      3333 |   1250
+      1250 |   1250
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a >= a AND b >= c');
@@ -4617,13 +4617,13 @@ DROP TABLE mcv_lists_partial;
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b');
  estimated | actual 
 -----------+--------
-      9950 |   2500
+      2500 |   2500
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a = b AND b = c');
  estimated | actual 
 -----------+--------
-        50 |   2500
+      2500 |   2500
 (1 row)
 
 -- check the ability to use multiple MCV lists
@@ -4854,55 +4854,55 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b AND c = d');
  estimated | actual 
 -----------+--------
-         1 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != d');
  estimated | actual 
 -----------+--------
-      4950 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b AND c > d');
  estimated | actual 
 -----------+--------
-       556 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND c >= d');
  estimated | actual 
 -----------+--------
-       556 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = b OR c = d');
  estimated | actual 
 -----------+--------
-        50 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != d');
  estimated | actual 
 -----------+--------
-      5000 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b OR c > d');
  estimated | actual 
 -----------+--------
-      2778 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR c >= d');
  estimated | actual 
 -----------+--------
-      2778 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a AND c = d');
  estimated | actual 
 -----------+--------
-        25 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b AND c != c');
@@ -4920,7 +4920,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a AN
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b AND d >= c');
  estimated | actual 
 -----------+--------
-       556 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a OR c = d');
@@ -4932,43 +4932,43 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = a OR
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a != b OR c != c');
  estimated | actual 
 -----------+--------
-      4975 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < a OR c > d');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b OR d >= c');
  estimated | actual 
 -----------+--------
-      2778 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a < b');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a > b');
  estimated | actual 
 -----------+--------
-      1667 |      0
+         1 |      0
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a <= b');
  estimated | actual 
 -----------+--------
-      1667 |   5000
+      5000 |   5000
 (1 row)
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a >= b');
  estimated | actual 
 -----------+--------
-      1667 |   5000
+      5000 |   5000
 (1 row)
 
 DROP TABLE mcv_lists_multi;
-- 
2.34.3

0004-Don-t-treat-Var-op-Var-as-simple-clauses-in-20220721.patchtext/x-patch; charset=UTF-8; name=0004-Don-t-treat-Var-op-Var-as-simple-clauses-in-20220721.patchDownload
From 8c497b61344c499c69791571b2c8b2589c4facde Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue, 19 Jul 2022 22:05:36 +0200
Subject: [PATCH 4/4] Don't treat Var op Var as simple clauses in MCV

Until now, statext_mcv_clauselist_selectivity defined simple clauses as
those with a single attnum. This worked fine for (Var op Const) clauses,
but for (Var op Var) that's no longer correct. We need to consider them
complex even if both sides reference the same attribute.

Note: Perhaps we should just treat (Var op Var) clauses referencing the
same attribute as incompatible with extended statistics. After all it's
not providing any information about other attributes. Although, it does
restrict the relevant part of MCV, because (Var = Var) implies non-NULL.
---
 src/backend/statistics/extended_stats.c       | 95 ++++++++++++++++---
 src/backend/statistics/mcv.c                  |  4 +-
 .../statistics/extended_stats_internal.h      |  2 +-
 3 files changed, 83 insertions(+), 18 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 7bfde068aff..bbf564075c2 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1320,11 +1320,14 @@ choose_best_statistics(List *stats, char requiredkind, bool inh,
  * statext_is_compatible_clause. It needs to be split like this because
  * of recursion.  The attnums bitmap is an input/output parameter collecting
  * attribute numbers from all compatible clauses (recursively).
+ *
+ * XXX The issimple variable is expected to be initialized by the caller, we
+ * just update it while recursively analyzing the current clause.
  */
 static bool
 statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 									  Index relid, Bitmapset **attnums,
-									  List **exprs)
+									  List **exprs, bool *issimple)
 {
 	/* Look inside any binary-compatible relabeling (as in examine_variable) */
 	if (IsA(clause, RelabelType))
@@ -1372,7 +1375,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		 * Check if the expression has the right shape. This returns either one
 		 * or two expressions, depending on whether there is a Const.
 		 */
-		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL, issimple))
 			return false;
 
 		/*
@@ -1421,20 +1424,37 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		/*
 		 * Check all expressions by recursing. Var expressions are handled as
 		 * a special case (to match it to attnums etc.)
+		 *
+		 * An opclause is simple if it's (Expr op Const) or (Const op Expr). We
+		 * have already checked the overall shape in examine_opclause_args, but
+		 * we haven't checked the expressions are simple (i.e. pretty much Var),
+		 * so we need to check that now. If we discover a complex expression, we
+		 * consider the whole clause complex.
 		 */
 		foreach (lc, clause_exprs)
 		{
 			Node *clause_expr = (Node *) lfirst(lc);
 
+			/*
+			 * XXX Shouldn't this try removing RelabelType, just like we do in
+			 * statext_is_compatible_clause_internal? Otherwise we might treat
+			 * Var (with a RelabelType on top) as complex expression.
+			 */
 			if (IsA(clause_expr, Var))
 			{
 				/* if the Var is incompatible, the whole clause is incompatible */
 				if (!statext_is_compatible_clause_internal(root, clause_expr,
-														   relid, attnums, exprs))
+														   relid, attnums, exprs,
+														   issimple))
 					return false;
 			}
 			else	/* generic expression */
+			{
 				*exprs = lappend(*exprs, clause_expr);
+
+				/* switch to false if there are any complex clauses */
+				*issimple = false;
+			}
 		}
 
 		return true;
@@ -1453,7 +1473,7 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			return false;
 
 		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL))
+		if (!examine_opclause_args(expr->args, &clause_exprs, NULL, NULL, issimple))
 			return false;
 
 		/* There has to be one expression exactly. */
@@ -1501,7 +1521,8 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		/* Check Var IN Array clauses by recursing. */
 		if (IsA(clause_expr, Var))
 			return statext_is_compatible_clause_internal(root, clause_expr,
-														 relid, attnums, exprs);
+														 relid, attnums, exprs,
+														 issimple);
 
 		/* Otherwise we have Expr IN Array. */
 		*exprs = lappend(*exprs, clause_expr);
@@ -1530,6 +1551,18 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		BoolExpr   *expr = (BoolExpr *) clause;
 		ListCell   *lc;
 
+		/*
+		 * All AND/OR clauses are considered complex, even if all arguments are
+		 * simple clauses. For NOT clauses we need to check the argument and then
+		 * we can update the flag.
+		 *
+		 * XXX Maybe for AND/OR we should check if all arguments reference the
+		 * same attnum, and consider them complex only when there are multiple
+		 * attnum values (i.e. different Vars)?
+		 */
+		if (!is_notclause(clause))
+			*issimple = false;
+
 		foreach(lc, expr->args)
 		{
 			/*
@@ -1538,7 +1571,8 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			 */
 			if (!statext_is_compatible_clause_internal(root,
 													   (Node *) lfirst(lc),
-													   relid, attnums, exprs))
+													   relid, attnums, exprs,
+													   issimple))
 				return false;
 		}
 
@@ -1553,7 +1587,8 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 		/* Check Var IS NULL clauses by recursing. */
 		if (IsA(nt->arg, Var))
 			return statext_is_compatible_clause_internal(root, (Node *) (nt->arg),
-														 relid, attnums, exprs);
+														 relid, attnums, exprs,
+														 issimple);
 
 		/* Otherwise we have Expr IS NULL. */
 		*exprs = lappend(*exprs, nt->arg);
@@ -1589,22 +1624,35 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
  */
 static bool
 statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
-							 Bitmapset **attnums, List **exprs)
+							 Bitmapset **attnums, List **exprs, bool *issimple)
 {
 	RangeTblEntry *rte = root->simple_rte_array[relid];
 	RestrictInfo *rinfo = (RestrictInfo *) clause;
 	int			clause_relid;
 	Oid			userid;
 
+	/*
+	 * Clauses are considered simple by default, and we mark them as complex
+	 * when we discover a complex part.
+	 */
+	*issimple = true;
+
 	/*
 	 * Special-case handling for bare BoolExpr AND clauses, because the
 	 * restrictinfo machinery doesn't build RestrictInfos on top of AND
 	 * clauses.
+	 *
+	 * AND clauses are considered complex, even if all arguments are
+	 * simple clauses.
 	 */
 	if (is_andclause(clause))
 	{
 		BoolExpr   *expr = (BoolExpr *) clause;
 		ListCell   *lc;
+		bool		tmp = false;	/* ignored result */
+
+		/* AND clauses are complex, even if the arguments are simple. */
+		*issimple = false;
 
 		/*
 		 * Check that each sub-clause is compatible.  We expect these to be
@@ -1613,7 +1661,7 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 		foreach(lc, expr->args)
 		{
 			if (!statext_is_compatible_clause(root, (Node *) lfirst(lc),
-											  relid, attnums, exprs))
+											  relid, attnums, exprs, &tmp))
 				return false;
 		}
 
@@ -1635,7 +1683,7 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
 
 	/* Check the clause and determine what attributes it references. */
 	if (!statext_is_compatible_clause_internal(root, (Node *) rinfo->clause,
-											   relid, attnums, exprs))
+											   relid, attnums, exprs, issimple))
 		return false;
 
 	/*
@@ -1725,6 +1773,7 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 	ListCell   *l;
 	Bitmapset **list_attnums;	/* attnums extracted from the clause */
 	List	  **list_exprs;		/* expressions matched to any statistic */
+	bool	  *list_simple;		/* marks simple expressions */
 	int			listidx;
 	Selectivity sel = (is_or) ? 0.0 : 1.0;
 	RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
@@ -1739,6 +1788,9 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 	/* expressions extracted from complex expressions */
 	list_exprs = (List **) palloc(sizeof(Node *) * list_length(clauses));
 
+	/* expressions determined to be simple (single expression) */
+	list_simple = (bool *) palloc(sizeof(bool) * list_length(clauses));
+
 	/*
 	 * Pre-process the clauses list to extract the attnums and expressions
 	 * seen in each item.  We need to determine if there are any clauses which
@@ -1756,17 +1808,21 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 		Node	   *clause = (Node *) lfirst(l);
 		Bitmapset  *attnums = NULL;
 		List	   *exprs = NIL;
+		bool		issimple = false;
 
 		if (!bms_is_member(listidx, *estimatedclauses) &&
-			statext_is_compatible_clause(root, clause, rel->relid, &attnums, &exprs))
+			statext_is_compatible_clause(root, clause, rel->relid,
+										 &attnums, &exprs, &issimple))
 		{
 			list_attnums[listidx] = attnums;
 			list_exprs[listidx] = exprs;
+			list_simple[listidx] = issimple;
 		}
 		else
 		{
 			list_attnums[listidx] = NULL;
 			list_exprs[listidx] = NIL;
+			list_simple[listidx] = false;
 		}
 
 		listidx++;
@@ -1803,6 +1859,8 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 		listidx = -1;
 		foreach(l, clauses)
 		{
+			Node *clause = (Node *) lfirst(l);
+
 			/* Increment the index before we decide if to skip the clause. */
 			listidx++;
 
@@ -1841,13 +1899,12 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 			/* record simple clauses (single column or expression) */
 			if ((list_attnums[listidx] == NULL &&
 				 list_length(list_exprs[listidx]) == 1) ||
-				(list_exprs[listidx] == NIL &&
-				 bms_membership(list_attnums[listidx]) == BMS_SINGLETON))
+				 list_simple[listidx])
 				simple_clauses = bms_add_member(simple_clauses,
 												list_length(stat_clauses));
 
 			/* add clause to list and mark it as estimated */
-			stat_clauses = lappend(stat_clauses, (Node *) lfirst(l));
+			stat_clauses = lappend(stat_clauses, clause);
 			*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
 
 			/*
@@ -2056,13 +2113,15 @@ statext_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
  * specifies on which side of the operator we found the expression node.
  */
 bool
-examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp)
+examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp,
+					  bool *issimplep)
 {
 	List	   *exprs = NIL;
 	Const	   *cst = NULL;
 	bool		expronleft;
 	Node	   *leftop,
 			   *rightop;
+	bool		issimple;
 
 	/* enforced by statext_is_compatible_clause_internal */
 	Assert(list_length(args) == 2);
@@ -2082,18 +2141,21 @@ examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp
 		exprs = lappend(exprs, leftop);
 		cst = (Const *) rightop;
 		expronleft = true;
+		issimple = true;
 	}
 	else if (IsA(leftop, Const))
 	{
 		exprs = lappend(exprs, rightop);
 		cst = (Const *) leftop;
 		expronleft = false;
+		issimple = true;
 	}
 	else
 	{
 		exprs = lappend(exprs, leftop);
 		exprs = lappend(exprs, rightop);
 		expronleft = false;
+		issimple = false;
 	}
 
 	/* return pointers to the extracted parts if requested */
@@ -2106,6 +2168,9 @@ examine_opclause_args(List *args, List **exprsp, Const **cstp, bool *expronleftp
 	if (expronleftp)
 		*expronleftp = expronleft;
 
+	if (issimplep)
+		*issimplep = (*issimplep && issimple);
+
 	return true;
 }
 
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 5806cfbb564..09a30dc1122 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -1654,7 +1654,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft, NULL))
 				elog(ERROR, "incompatible clause");
 
 			if (cst)	/* Expr op Const */
@@ -1820,7 +1820,7 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 			fmgr_info(get_opcode(expr->opno), &opproc);
 
 			/* extract the var/expr and const from the expression */
-			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft))
+			if (!examine_opclause_args(expr->args, &clause_exprs, &cst, &expronleft, NULL))
 				elog(ERROR, "incompatible clause");
 
 			/* ScalarArrayOpExpr has the Var always on the left */
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index bd4bf7fdf14..bfe8fdf264a 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -100,7 +100,7 @@ extern SortItem *build_sorted_items(StatsBuildData *data, int *nitems,
 									int numattrs, AttrNumber *attnums);
 
 extern bool examine_opclause_args(List *args, List **exprs, Const **cstp,
-								  bool *expronleftp);
+								  bool *expronleftp, bool *issimplep);
 
 extern Selectivity mcv_combine_selectivities(Selectivity simple_sel,
 											 Selectivity mcv_sel,
-- 
2.34.3

#48Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tomas Vondra (#47)
Re: Use extended statistics to estimate (Var op Var) clauses

On Thu, 21 Jul 2022 at 12:42, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

This needs to account for nullfrac, since x = x is only true if x is not null.

Right, I forgot to account for nullfrac.

Ditto variable <= variable

I don't like how matching_restriction_variables() is adding a
non-trivial amount of overhead to each of these selectivity functions,
by calling examine_variable() for each side, duplicating what
get_restriction_variable() does later on.

But matching_restriction_variables() does not use examine_variable()
anymore. It did, originally, but 0002 removed all of that. Now it does
just pull_varnos() and that's it. I admit leaving those bits unsquashed
was a bit confusing, but the first part was really 0001+0002+0003.

Ah, right, I only looked at 0001, because I thought that was the main
part that I hadn't previously looked at.

So my previous concern with matching_restriction_variables() was how
many extra cycles it was adding to test for what should be a pretty
uncommon case. Looking at the new version, I think it's a lot better,
but perhaps it would be more efficient to call equal() as soon as
you've extracted "left" and "right", so it can bail out earlier and
faster when they're not equal. I think a call to equal() should be
very fast compared to pull_varnos() and contain_volatile_functions().

Attached is a rebased and somewhat cleaned up version of the patch
series, addressing the review comments so far and squashing the bits I
previously kept separate to showcase the changes.

I've also added a bunch of regression tests - queries with (Var op Var)
clauses of varying complexity, to demonstrate the effect of each patch.
I added them as 0001, so it's clear how the individual patches affect
the results.

Cool. That's much clearer, and the results look quite good.

The main thing that jumps out at me now is the whole "issimple"
processing stuff. Those changes turned out to be a lot more invasive
than I thought. I don't think this part is correct:

/*
* All AND/OR clauses are considered complex, even if all arguments are
* simple clauses. For NOT clauses we need to check the argument and then
* we can update the flag.
*
* XXX Maybe for AND/OR we should check if all arguments reference the
* same attnum, and consider them complex only when there are multiple
* attnum values (i.e. different Vars)?
*/

I think the XXX part of that comment is right, and that's what the
original code did.

I had to go remind myself what "simple" was intended for, so apologies
if this is really pedestrian:

The basic idea of a "simple" clause was meant to mean any clause
that's likely to be better estimated by standard statistics, rather
than extended statistics (and "issimple" only applies when such
clauses are combined using "OR"). So, for example, given a query with

WHERE a = 1 OR (b > 0 AND b < 10)

both "a = 1" and "b > 0 AND b < 10" should be considered "simple",
since the standard statistics code is likely to estimate them quite
well. For the second clause, it might use histograms and the
RangeQueryClause-machinery, which ought to work well, whereas applying
a multivariate MCV list to "b > 0 AND b < 10" in isolation would
probably make its estimate worse.

So then, in this example, what the "OR" handling in
statext_mcv_clauselist_selectivity() would end up doing is:

  P(a = 1 OR (b > 0 AND b < 10))
    = P(a = 1)
    + P(b > 0 AND b < 10)
    - P(a = 1 AND b > 0 AND b < 10)   # Overlap

and only use extended statistics for the overlap term, since the other
2 clauses are "simple", and best estimated without extended stats. The
patch changes that, which ends up making things worse in some cases.

Is it the case that the only reason for changing the "issimple"
handling was because the standard statistics code didn't work well for
things like "a < a", and so we wanted to treat that as not-simple? If
so, given the selfuncs improvements, perhaps that's no longer
necessary, and the original definition of "simple" is OK. IOW, is 0004
necessary, given 0002?.

I notice that 0004 didn't change any test results, so if there are
cases where it improves things, they aren't tested.

Here's a simple example using the WHERE clause above. Without the
patch, extended stats improved the estimate, but with the patch they
don't anymore:

DROP TABLE IF EXISTS foo;
CREATE TABLE foo (a int, b int);
INSERT INTO foo SELECT x/10+1, x FROM generate_series(1,10000) g(x);
ANALYSE foo;
EXPLAIN ANALYSE SELECT * FROM foo WHERE a = 1 OR (b > 0 AND b < 10);

Estimated: 18
Actual: 9

CREATE STATISTICS foo_s (mcv) ON a,b FROM foo;
ANALYSE foo;
EXPLAIN ANALYSE SELECT * FROM foo WHERE a = 1 OR (b > 0 AND b < 10);

Estimated: 9 without the patch, 18 with the patch
Actual: 9

It's a worry that none of the existing regression tests picked up on
that. Perhaps a similar test could be added using the existing test
data. Otherwise, I think it'd be worth adding a new test with similar
data to the above.

Regards,
Dean

#49Michael Paquier
michael@paquier.xyz
In reply to: Dean Rasheed (#48)
Re: Use extended statistics to estimate (Var op Var) clauses

On Fri, Jul 22, 2022 at 02:17:47PM +0100, Dean Rasheed wrote:

It's a worry that none of the existing regression tests picked up on
that. Perhaps a similar test could be added using the existing test
data. Otherwise, I think it'd be worth adding a new test with similar
data to the above.

This feedback has not been answered for two months, so marked this
entry as RwF for now.
--
Michael